comparison DVN-web/installer/dvninstall/doc/guides/_sources/dataverse-installer-main.txt @ 6:1b2188262ae9

adding the installer.
author "jurzua <jurzua@mpiwg-berlin.mpg.de>"
date Wed, 13 May 2015 11:50:21 +0200
parents
children
comparison
equal deleted inserted replaced
5:dd9adfc73390 6:1b2188262ae9
1 ====================================
2 Installers Guide
3 ====================================
4
5 .. _introduction:
6
7 **Introduction**
8
9 This is our "new and improved" installation guide, it was first
10 released with the Dataverse Network application versions 2.2.4, when we
11 introduced the new, automated and much simplified installation process.
12 As of February 2012, it has been updated to reflect the changes made in
13 the newly released version 3.0 of the software. (Our existing users will
14 notice however, that the changes in the installation process have been
15 fairly minimal).
16
17 The guide is intended for anyone who needs to install the DVN app,
18 developers and Dataverse Network administrators alike.
19
20 The top-down organization of the chapters and sections is that of
21 increasing complexity. First a very basic, simple installation scenario
22 is presented. The instructions are straightforward and only the required
23 components are discussed. This use case will in fact be sufficient for
24 most DVN developers and many Dataverse Network administrators. Chances
25 are you are one of such users, so if brave by nature, you may stop
26 reading this section and go straight to the :ref:`“Quick Install” <quick-install>` chapter.
27
28 The “basic” installation process described in the first chapter is
29 fully automated, everything is performed by a single interactive script.
30 This process has its limitations. It will likely work only on the
31 supported platforms. Optional components need to be configured outside
32 of the Installer (these are described in the "Optional Components"
33 section).
34
35 For an advanced user, we provide the detailed explanations of all the
36 steps performed by the Installer. This way he or she can experiment with
37 individual configuration options, having maximum flexibility and control
38 over the process. Yet we tried to organize the advanced information in
39 such a way that those who only need the most basic instructions would
40 not have to read through it unnecessarily. Instead we provide them with
41 an easy way to get a bare-bones configuration of the DVN up and running.
42
43 If you are interested in practicing a DVN installation in a Vagrant
44 environment you can later throw away, please follow the instructions at
45 https://github.com/dvn/dvn-install-demo to spin up a Linux virtual
46 machine on your laptop with ``vagrant up``. When you are finished with
47 this temporary DVN installation, you can delete the virtual machine with
48 ``vagrant destroy``.
49
50 If you encounter any problems during installation, please contact the
51 development team
52 at `support@thedata.org <mailto:support@thedata.org>`__
53 or our `Dataverse Users
54 Community <https://groups.google.com/forum/?fromgroups#!forum/dataverse-community>`__.
55
56 .. _quick-install:
57
58 Quick Install
59 ++++++++++++++++++++++
60
61 For an experienced and/or rather bold user, this is a 1
62 paragraph version of the installation instructions:
63
64 This should work on RedHat and its derivatives, and MacOS X. If this
65 does not describe your case, you will very likely have to install and
66 configure at least some of the components manually. Meaning, you may
67 consider reading through the chapters that follow! Still here? Great.
68 Prerequisites: Sun/Oracle Java JDK 1.6\_31+ and a “virgin” installation
69 of Glassfish v3.1.2; PostgreSQL v8.3+, configured to listen to network
70 connections and support password authentication on the localhost
71 interface; you may need R as well. See the corresponding sections under
72 “2. Prerequisites”, if necessary. Download the installer package from
73 SourceForge:
74
75 `http://sourceforge.net/projects/dvn/files/dvn <http://sourceforge.net/projects/dvn/files/dvn>`__
76
77 Choose the latest version and download the dvninstall zip file.
78
79 Unzip the package in a temp location of your choice (this will create
80 the directory ``dvninstall``). Run the installer, as root:
81
82 ``cd dvninstall``
83 ./ ``install``
84
85 Follow the installation prompts. If it all works as it should, you
86 will have a working DVN instance running in about a minute from now.
87
88 Has it worked? Awesome! Now you may read the rest of the guide
89 chapters at your own leisurely pace, to see if you need any of the
90 optional components described there. And/or if you want to understand
91 what exactly has just been done to your system.
92
93 SYSTEM REQUIREMENTS
94 ++++++++++++++++++++++++++++++++++
95
96 Or rather, recommendations. The closer your configuration is to what’s
97 outlined below, the easier it will be for the DVN team to provide
98 support and answer your questions.
99
100 - Operating system - The production version of the Dataverse Network at
101 IQSS (dvn.iq.harvard.edu) runs on RedHat Linux 5. Most of the DVN
102 development is currently done on MacOS X. Because of our experience
103 with RedHat and MacOS X these are the recommended platforms. You
104 should be able to deploy the application .ear file on any other
105 platform that supports Java. However, the automated installer we
106 provide will likely work on RedHat and MacOS only. Some information
107 provided in this guide is specific to these 2 operating systems. (Any
108 OS-specific instructions/examples will be clearly marked, for
109 example:\ ``[MacOS-specific:]``)
110
111 - CPU - The production IQSS Dataverse Network runs on generic,
112 multi-core 64-bit processors.
113
114 - Memory - The application servers currently in production at the IQSS
115 have 64 GB of memory each. Development and testing systems require a
116 minimum of 2 gigabyte of memory.
117
118 - Disk space - How much disk space is required depends on the amount of
119 data that you expect to serve. The IQSS Dataverse Network file system
120 is a standalone NetApp with 2 TB volume dedicated to the DVN data.
121
122 - Multiple servers – All the DVN components can run on the same server.
123 On a busy, hard-working production network the load can be split
124 across multiple servers. The 3 main components, the application
125 server (Glassfish), the database (Postgres) and R can each run on its
126 own host. Furthermore, multiple application servers sharing the same
127 database and R server(s) can be set up behind a load balancer.
128 Developers would normally run Glassfish and Postgres on their
129 workstations locally and use a shared R server.
130
131 - If it actually becomes a practical necessity to bring up more servers
132 to handle your production load, there are no universal instructions
133 on how to best spread it across extra CPUs. It will depend on the
134 specifics of your site, the nature of the data you serve and the
135 needs of your users, whether you’ll benefit most from dedicating
136 another server to run the database, or to serve R requests. Please
137 see the discussion in the corresponding sections of the Prerequisites
138 chapter.
139
140 .. _prerequisites:
141
142 PREREQUISITES
143 ++++++++++++++++++++++++++
144
145 In this chapter, an emphasis is made on clearly identifying those
146 components that are absolutely required for every installation and
147 marking any advanced, optional instructions as such.
148
149 Glassfish
150 =======================
151
152 Version 3.1.2 is required.
153
154 Make sure you have **Sun/Oracle**\ **Java JDK version 1.6, build 31**
155 or newer\. It is available from
156 `http://www.oracle.com/technetwork/java/javase/downloads/index.html <http://www.oracle.com/technetwork/java/javase/downloads/index.html>`__.
157
158
159 **[note for developers:]**
160
161 If you are doing this installation as part of your DVN software
162 development setup: The version of NetBeans currently in use by the DVN
163 team is 7.0.1, and it is recommended that you use this same version if
164 you want to participate in the development. As of writing of this
165 manual, NetBeans 7.0.1 installer bundle comes with an older version of
166 Glassfish. So you will have to install Glassfish version 3.1.2
167 separately, and then select it as the default server for your NetBeans
168 project.
169
170 **[/note for developers]**
171
172 We **strongly** recommend that you install GlassFish Server 3.1.2,
173 Open Source Edition, **Full Platform**. You are very likely to run into
174 installation issues if you attempt to run the installer and get the
175 application to work with a different version! Simply transitioning from
176 3.1.1 to 3.1.2 turned out to be a surprisingly complex undertaking,
177 hence this recommendation to all other installers and developers to stay
178 with the same version.
179
180 It can be obtained from
181
182 `http://glassfish.java.net/downloads/3.1.2-final.html <http://glassfish.java.net/downloads/3.1.2-final.html>`__
183
184 The page contains a link to the installation instructions. However,
185 the process is completely straightforward. You are given 2 options for
186 the format of the installer package. We recommend that you choose to
187 download it as a shell archive; you will need to change its executable
188 permission, with **chmod +x**, and then run it, as root:
189
190 ./**installer-filename.sh**
191
192 [**Important:]**
193
194 Leave the admin password fields blank. This is not a security risk,
195 since out of the box, Glassfish will only be accepting admin connections
196 on the localhost interface. Choosing password at this stage however will
197 complicate the installation process unnecessarily\ **.**\ If this is a
198 developers installation, you can probably keep this configuration
199 unchanged (admin on localhost only). If you need to be able to connect
200 to the admin console remotely, please see the note in the Appendix
201 section of the manual.
202
203 **[/Important]**
204
205 | **[Advanced:]**
206 | **[Unix-specific:`]**
207
208 The installer shell script will normally attempt to run in a graphic
209 mode. If you are installing this on a remote Unix server, this will
210 require X Windows support on your local workstation. If for whatever
211 reason it's not available, you have an option of running it in a *silent
212 mode* - check the download page, above, for more information.
213
214 | **[/Unix-specific]**
215 | **[/Advanced]**
216
217 .. _postgresql:
218
219 PostgreSQL
220 =======================
221
222 | **Version 8.3 or higher is required.**
223 | Installation instructions specific to RedHat Linux and MacOS X are
224 | provided below.
225 | Once the database server is installed, you'll need to configure access
226 | control to suit your installation.
227 | Note that any modifications to the configuration files above require you to restart Postgres:
228 | ``service postgresql restart`` (RedHat)
229
230 | or
231 | "Restart Server" under Applications -> PostgreSQL (MacOS X)
232
233 By default, most Postgres distributions are configured to listen to network connections on the localhost interface only; and to only support ident for authentication. (The MacOS installer may ask you if network connections should be allowed - answer "yes"). At a minimum, if GlassFish is running on the same host, it will also need to allow password authentication on localhost. So you will need to modify the "``host all all 127.0.0.1``\ " line in your ``/var/lib/pgsq1/data/pg_hba.conf`` so that it looks like this:
234
235 | ``host all all 127.0.0.1/32 password``
236
237 Also, the installer script needs to have direct access to the local PostgresQL server via Unix domain sockets. So this needs to be set to either "trust" or "ident". I.e., your **pg\_hba.conf** must contain either of the 2 lines below:
238
239 | **local all all ident sameuser**
240 | or
241 | **local all all trust**
242
243 ("ident" is the default setting; but if it has been changed to
244 "password" or "md5", etc. on your system, Postgres will keep prompting
245 you for the master password throughout the installation)
246
247 **[optional:]**
248
249 If GlassFish will be accessing the database remotely, add or modify the following line in your ``<POSTGRES DIR>/data/postgresql.conf``:
250
251 | ``listen_addresses='*'``
252
253 to enable network connections on all interfaces; and add the following
254 line to ``pg_hba.conf``:
255
256 | host all all ``[ADDRESS] 255.255.255.255 password``
257
258 | where ``[ADDRESS]`` is the numeric IP address of the GlassFish server.
259 | Using the subnet notation above you can enable authorization for multiple hosts on | your network. For example,
260
261 | ``host all all 140.247.115.0 255.255.255.0 password``
262
263 | will permit password-authenticated connections from all hosts on the ``140.247.115.*`` subnet.
264 | **[/optional:]**
265
266 |
267 | **[RedHat-specific:]**
268 | **[Advanced:]**
269
270 Please note that the instructions below are meant for users who have some experience with basic RedHat admin tasks. You should be safe to proceed if an instruction such as “uninstall the postgres rpms” makes sense to you immediately. I.e., if you already know how to install or uninstall an rpm package. Otherwise we recommend that you contact your systems administrator.
271
272 For RedHat (and relatives), version 8.4 is now part of the distribution. As of RedHat 5, the default ``postgresql`` rpm is still version 8.1. So you may have to un-install the ``postgresql`` rpms, then get the ones for version 8.4:
273
274 | ``yum install postgresql84 postgresql84-server``
275
276 Before you start the server for the first time with
277
278 | ``service postgresql start``
279
280 You will need to populate the initial database with
281
282
283 | ``service postgresql initdb``
284
285
286 | **[/advanced]**
287 | **[/RedHat-specific]**
288
289
290 **[MacOS-specific:]**
291
292
293 Postgres Project provides a one click installer for Mac OS X 10.4 and
294 above at
295 `http://www.postgresql.org/download/macosx <http://www.postgresql.org/download/macosx>`__.
296 Fink and MacPorts packages are also available.
297
298
299 **[/MacOS-specific]`**
300
301
302 | **[advanced:]**
303 | **[optional:]**
304
305 See the section :ref:`PostgresQL setup <postgresql-setup>` in the Appendix for the description of the steps that the automated installer takes to set up PostgresQL for use with the DVN.
306
307 | **[/optional]**
308 | **[/advanced]**
309
310 .. _r-and-rserve:
311
312 R and RServe
313 =======================
314
315 Strictly speaking, R is an optional component. You can bring up a
316 running DVN instance without it. The automated installer will allow such
317 an installation, with a warning. Users of this Dataverse Network will be
318 able to upload and share some data. Only the advanced modes of serving
319 quantitative data to the users require R ``[style?]``. Please consult
320 the :ref:`"Do you need R?" <do-you-need-r>` section in the Appendix for an extended discussion of this.
321
322
323 | **Installation instructions:**
324
325 Install the latest version of R from your favorite CRAN mirror (refer to `http://cran.r-project.org/ <http://cran.r-project.org/>`__ for more information). Depending on your OS distribution, this may be as simple as typing
326
327 | **[RedHat/Linux-specific:]**
328
329 ``yum install R R-devel``
330
331 (for example, the above line will work in CentOS out of the box; in RedHat, you will have to add support for EPEL repository -- see
332 `http://fedoraproject.org/wiki/EPEL <http://fedoraproject.org/wiki/EPEL>`__
333 -- then run the ``yum install`` command)
334
335 | **[/RedHat/Linux-specific]**
336
337 Please make sure to install the "devel" package too! you will need it
338 to build the extra R modules.
339
340 Once you have R installed, download the package ``dvnextra.tar`` from this location:
341
342 `http://dvn.iq.harvard.edu/dist/R/dvnextra.tar <http://dvn.iq.harvard.edu/dist/R/dvnextra.tar>`__
343
344 Unpack the archive:
345
346 ``tar xvf dvnextra.tar``
347
348 then run the supplied installation shell script as root:
349
350 | ``cd dvnextra``
351 | ``./installModules.sh``
352
353 This will install a number of R modules needed by the DVN to run statistics and analysis, some from CRAN and some supplied in the bundle; it will also configure Rserve to run locally on your system and install some startup files that the DVN will need.
354
355 **Please note that the DVN application requires specific versions of the 3rd-party R packages. For example, if you obtain and install the version of Zelig package currently available from CRAN, it will not work with the application. This is why we distribute the sources of the correct versions in this tar package.**
356
357
358 | **[advanced:]**
359 | We haven’t had much experience with R on any platforms other than RedHat-and-the-like. Our developers use MacOS X, but point their DVN instances to a shared server running Rserve under RedHat.
360
361 The R project ports their distribution to a wide range of platforms. However, the installer shell script above will only run on Unix; and is not really guaranteed to work on anything other than RedHat. If you have some experience with either R or system administration, you should be able to use the script as a guide to re-create the configuration steps on any other platform quite easily. You will, however, be entirely on your own while embarking on that adventure.
362 **[/advanced]**
363
364
365
366 System Configuration
367 ================================
368
369 **[Advanced/optional:]**
370
371 Many modern OS distributions come pre-configured so that all the
372 network ports are firewalled off by default.
373
374 Depending on the configuration of your server, you may need to open some
375 of the following ports.
376
377 On a developers personal workstation, the user would normally access his
378 or her DVN instance on the localhost interface. So no open ports are
379 required unless you want to give access to your DVN to another
380 user/developer.
381
382 When running a DVN that is meant to be accessible by network users: At a
383 minimum, if all the components are running on the same server, the HTTP
384 port 80 needs to be open. You may also want to open TCP 443, to be able
385 to access Glassfish admin console remotely.
386
387 If the DVN is running its own HANDLE.NET server (see Chapter 4.
388 "Optional Components"), the TCP port 8000 and TCP/UDP ports 2641 are
389 also needed.
390
391 If the DVN application needs to talk to PostgreSQL and/or Rserve running
392 on remote hosts, the TCP ports 5432 and 6311, respectively, need to be
393 open there.
394
395 **[/Advanced/optional]**
396
397
398
399 RUNNING THE INSTALLER
400 +++++++++++++++++++++++++++++++++++++++++
401
402 Once the :ref:`Prerequisites <prerequisites>` have been take care of, the DVN application can be installed.
403
404 The installer package can be downloaded from our repository on SourceForge at
405
406 `http://sourceforge.net/projects/dvn/files/dvn/3.0/dvninstall\_v3\_0.zip <http://sourceforge.net/projects/dvn/files/dvn/3.0/dvninstall_v3_0.zip>`_
407
408 | Unzip the package in a temp location of your choice (this will create the directory | ``dvninstall``). Run the installer, as root:
409 | ``cd dvninstall``
410 | ``./install``
411
412 Follow the installation prompts. The installer will first verify the contents of the package and check if the required components
413 (in :ref:`Prerequisites <prerequisites>`) are present on the system. Then it will lead you through the application setup.
414
415 | **[Advanced:]**
416
417 The limitations of the installer package:
418
419 Some extra configuration steps will be required if the PostgreSQL database is being set up on a remote server.
420
421 It will most likely only work on the supported platforms, RedHat and Mac OS X.
422
423 It is only guaranteed to work on a fresh Glassfish installation. If you already have more than one Glassfish domains created and/or have applications other than the DVN running under Glassfish, please consult the :ref:`"What does the Installer do?" <what-does-the-intstaller-do>` section.
424
425 It does not install any of the optional components (:ref:`see Chapter 4<optional-components>`.)
426
427 For the detailed explanation of the tasks performed by the Installer, see the :ref:`"What does the Installer do?" <what-does-the-intstaller-do>` section.
428
429 | **[/Advanced]**
430
431 .. _optional-components:
432
433 Optional Components
434 ++++++++++++++++++++++++++
435
436 ``[The sections on ImageMagick, Google Analytics and Captcha have been rewritten and, hopefully, made less confusing. The Handles instructions have also been modified, but I would like to work on it some more. Namely I'd like to read their own technical manual, and see if we should provide our own version of installation instructions, similarly to what we do with some other packages; we've heard complaints from users about their manual not being very easy to follow]``
437
438 reCAPTCHA bot blocker
439 =================================
440
441 We found that our “email us” feature can be abused to send spam
442 messages. You can choose to use the reCAPTCHA filter to help prevent
443 this. Configure the filter as follows:
444
445 #. | Go to reCAPTCHA web site at
446 | `http://recaptcha.net/ <http://recaptcha.net/>`_
447 | and sign up for an account.
448 | Register your website domain to acquire a public/private CAPTCHA key pair.
449 | Record this information in a secure location.
450 #. Insert the the public/private key pair and domain for your reCAPTCHA
451 account into the ``captcha`` table of the DVN PostgreSQL database.
452 Use ``psql``, ``pgadmin`` or any other database utility; the SQL
453 query will look like this:
454 ``INSERT INTO captcha (publickey, domainname, privatekey) VALUES ('sample', 'sample.edu', 'sample')``
455 #. Verify that the Report Issue page is now showing the reCAPTCHA
456 challenge.
457
458 Google Analytics
459 ================================
460
461 Network Admins can use the Google Analytics tools to view Dataverse Network website usage statistics.
462
463 Note: It takes about 24 hours for Google Analytics to start monitoring
464 your website after the registration.
465
466 |
467 | To enable the use of Google Analytics:
468
469 #. Go to the Google Analytics homepage at
470 `http://www.google.com/analytics/indexu.html <http://www.google.com/analytics/indexu.html>`__.
471 #. Set up a Google Analytics account and obtain a tracking code for your Dataverse Network installation.
472 #. Use the Google Analytics Help Center to find how to add the tracking code to the content you serve.
473 #. Configure the DVN to use the tracking key (obtained in Step 2,
474 above), by setting | the ``dvn.googleanalytics.key`` JVM option in
475 Glassfish.
476
477 This can be done by adding the following directly to the
478 ``domain.xml`` config file (for example: ``/usr/local/glassfish/domains/domain1/confi/domain.xml``):
479 ``<jvm-options>-Ddvn.googleanalytics.key=XX-YYY</jvm-options>`` (this will require Glassfish restart)
480
481 Or by using the Glassfish Admin Console configuration GUI. Consult the “Glassfish Configuration” section in the Appendix.
482
483 Once installed and activated, the usage statistics can be accessed from
484 the Network Options of the DVN.
485
486 ImageMagick
487 =======================
488
489 When image files are ingested into a DVN, the application
490 automatically creates small "thumbnail" versions to display on the
491 Files View page. These thumbnails are generated once, then cached for
492 future use.
493
494 Normally, the standard Java image manipulation libraries are used to
495 do the scaling. If you have studies with large numbers of large
496 images, generating the thumbnails may become a time-consuming task. If
497 you notice that the Files view takes a long time to load for the first
498 time because of the images, it is possible | to improve the
499 performance by installing the ``ImageMagick`` package. If it is
500 installed, the application will automatically use its
501 ``/usr/bin/convert`` utility to do the resizing, which appears to be
502 significantly faster than the Java code.
503
504 ``ImageMagick`` is available for, or even comes with most of the popular OS distributions.
505
506
507 | **<RedHat-Specific:>**
508
509 It is part of the full RedHat Linux distribution, although it is not
510 included in the default "server" configuration. It can be installed on a
511 RedHat server with the ``yum install ImageMagick`` command.
512
513 **</RedHat-Specific>**
514
515 Handle System
516 ===========================
517
518 DVN administrators may choose to set up a `HANDLE.NET <http://www.handle.net/>`_ server to issue and register persistent, global identifiers for their studies. The DVN app can be modified to support other naming services, but as of now it comes
519 pre-configured to use Handles.
520
521 To install and set up a local HANDLE.NET server:
522
523 #. Download HANDLE.NET.
524 Refer to the HANDLE.NET software download page at
525 `http://handle.net/download.html <http://handle.net/download.html>`__.
526 #. Install the server on the same host as GlassFish.
527 Complete the installation and setup process as described in the
528 HANDLE.NET Technical Manual:
529 `http://www.handle.net/tech_manual/Handle_Technical_Manual.pdf <http://www.handle.net/tech_manual/Handle_Technical_Manual.pdf>`__.
530 #. Accept the default settings during installation, **with one
531 exception:** do not encrypt private keys (this will make it easier to
532 manage the service). **Note** that this means answer 'n' when
533 prompted "Would you like to encrypt your private key?(y/n). [y]:" If
534 you accept the default 'y' and then hit return when prompted for
535 passphrase, this **will** encrypt the key, with a blank pass phrase!
536 #. During the installation you will be issued an "authority prefix".
537 This is an equivalent of a domain name. For example, the prefix
538 registered to the IQSS DVN is "1902.1". The IDs issued to IQSS
539 studies are of a form "1902.1/XXXX", where "XXXX" is some unique
540 identifier.
541 #. Use ``psql`` or ``pgAdmin`` to execute the following SQL command:
542 ``insert into handleprefix (prefix) values( '<your HANDLE.NET prefix>')``;
543 #. ``(Optional/advanced)`` If you are going to be assigning HANDLE.NET
544 ids in more than 1 authority prefix (to register studies harvested
545 from remote sources): Once you obtain the additional HANDLE.NET
546 prefixes, add each to the ``handleprefix`` table, using the SQL
547 command from step 3.
548 #. Use ``psql`` or ``pgAdmin`` to execute the following SQL
549 command: ``update vdcnetwork set handleregistration=true, authority='<your HANDLE.NET prefix>';``
550
551
552
553 Note: The DVN app comes bundled with the HANDLE.NET client libraries.
554 You do not need to install these separately.
555
556 Twitter setup
557 ======================
558
559 To set up the ability for users to enable Automatic Tweets in your
560 Dataverse Network:
561
562 #. You will first need to tell twitter about you Dataverse Network Application. Go to `https://dev.twitter.com/apps <https://dev.twitter.com/apps>`_ and login (or create a new Twitter account).
563 #. Click "Create a new application".
564 #. Fill out all the fields. For callback URL, use your Dataverse Network Home Page URL.
565 #. Once created, go to settings tab and set Application Type to "Read and Write". You can optionally also upload an Application
566 Icon and fill out Organization details (the end user will see these.
567 #. Click details again. You will need both the Consumer key and secret as JVM Options. Add via Glassfish console:
568 -Dtwitter4j.oauth.consumerKey=***
569
570
571 -Dtwitter4j.oauth.consumerSecret=***
572 #. Restart Glassfish.
573 #. To verify that Automatic Tweets are now properly set up, you can go to the Dataverse Network Options page or any Dataverse Options page and see that their is a new option, "Enable Twitter".
574
575 Digital Object Identifiers
576 ==========================
577
578 Beginning with version 3.6, DVN will support the use of Digital Object Identifiers. Similar to the currently enabled Handle System, these DOIs will enable a permanent link to studies in a DVN network.
579
580 DVN uses the EZID API (`www.n2t.net/ezid <http://www.n2t.net/ezid>`__) to facilitate the creation and maintenance of DOIs. Network administrators will have to arrange to get their own account with EZID in order to implement creation of DOIs. Once an account has been set up the following settings must be made in your DVN set-up:
581
582 Update your database with the following query:
583
584 Use ``psql`` or ``pgAdmin`` to execute the following SQL command:
585 ``update vdcnetwork set handleregistration=true, protocol = 'doi', authority='<the namespace associated with your EZID account> where id = 0;``
586
587 Add the following JVM options:
588
589 ``-Ddoi.username=<username of your EZID account>``
590
591 ``-Ddoi.password=<password of your EZID account>``
592
593 ``-Ddoi.baseurlstring=https://ezid.cdlib.org``
594
595 Note: The DVN app comes bundled with the EZID API client libraries. You do not need to install these separately.
596
597 Appendix
598 +++++++++++++++++++++++
599
600 .. _do-you-need-r:
601
602 Do you need R?
603 ==========================
604
605 This is a more detailed explanation of the statement made earlier in the "Prerequisites" section: "Only the advanced modes of serving quantitative data to the users require R." ``[style?]``
606
607 In this context, by “quantitative data” we mean data sets for which
608 machine-readable, variable-level metadata has been defined in the DVN
609 database. “Subsettable data” is another frequently used term, in the
610 DVN parlance. The currently supported sources of subsettable data are
611 SPSS and STATA files, as well as row tabulated or CSV files, with
612 extra control cards defining the data structure and variable
613 metadata. (See full documentation in User Guide for :ref:`Finding and Using Data <finding-and-using-data>`
614
615 Once a “subsettable” data set is create, users can run online statistics and analysis on it. That’s where R is used. In our experience, most of the institutions who have installed the DVN did so primarily in order to share and process quantitative data. When this is the case, R must be considered a required component. But a DVN network built to serve a collection of strictly human-readable (text, image, etc.) data, R will not be necessary at all.
616
617 .. _what-does-the-intstaller-do:
618
619 What does the Installer do?
620 ===================================
621
622 The Installer script (chapters Quick Install, Running the Installer.) automates the following tasks:
623
624 #. Checks the system for required components;
625 #. Prompts the user for the following information:
626
627 a) Location of the Glassfish directory;
628
629 b) Access information (host, port, database name, username, password) for PostgresQL;
630
631 c) Access information (host, port, username, password) for Rserve;
632
633 #. Attempts to create the PostgreSQL user (role) and database, from :ref:`prerequisiste PostgreSQL setup step <postgresql>` above; see the :ref:`"PostgreSQL configuration"<postgresql-setup>` Appendix section for details.
634 #. Using the :ref:`Glassfish configuration template (section the Appendix) <glassfish-configuration-template>` and the information collected in step 2.b. above, creates the config file domain.xml and installs it the Glassfish domain directory.
635 #. Copies additional configuration files (supplied in the dvninstall/config directory of the Installer package) into the config directory of the Glassfish domain.
636 #. Installs Glassfish Postgres driver (supplied in the dvninstall/pgdriver directory of the Installer package) into the lib directory in the Glassfish installation tree.
637 #. Attempts to start Glassfish. The config file at this point contains the configuration settings that the DVN will need to run (see section :ref:`Glassfish Configuration, individual settings section<glassfish-configuration-individual-settings>` of the Appendix), but otherwise it is a "virgin", fresh config. Glassfish will perform some initialization tasks on this first startup and deploy some internal apps.
638 #. If step 5. succeeds, the Installer attempts to deploy the DVN application (the Java archive DVN-EAR.ear supplied with the installer).
639 #. Stops Glassfish, populates the DVN database with the initial content (section :ref:`"PostgreSQL configuration"<postgresql-setup>`" of the Appendix), starts Glassfish.
640 #. Attempts to establish connection to Rserve, using the access information obtained during step 2.c. If this fails, prints a warning message and points the user to the Prerequisites section of this guide where R installation is discussed.
641 #. Finally, prints a message informing the user that their new DVN should be up and running, provides them with the server URL and suggests that they visit it, to change the default passwords and perhaps start setting up their Dataverse Network.
642
643 Throughout the steps above, the Installer attempts to diagnose any
644 potential issues and give the user clear error messages when things go
645 wrong ("version of Postgres too old", "you must run this as root",
646 etc.).
647
648 Enough information is supplied in this manual to enable a user (a
649 skilled and rather patient user, we may add) to perform all the steps
650 above without the use of the script.
651
652 .. _glassfish-configuration-template:
653
654 Glassfish configuration template
655 ====================================
656
657 The configuration template (``domain.xml.TEMPLATE``) is part of the
658 installer zip package. The installer replaces the placeholder
659 configuration tokens (for example, ``%POSTGRES_DATABASE%``) with the
660 real values provided by the user to create the Glassfish configuration
661 file ``domain.xml``.
662
663 ``[I was thinking of copy-and-pasting the entire template file here;
664 but it is 30K of XML, so I decided not to. The above explains where it
665 can be found, if anyone wants to look at it, for reference or
666 whatever]``
667
668 .. _glassfish-configuration-individual-settings:
669
670 Glassfish Configuration, individual settings
671 =====================================================
672
673 As explained earlier in the Appendix, the Installer configures Glassfish
674 by cooking a complete domain configuration file (``domain.xml``) and
675 installing it in the domain directory.
676
677 All of the settings and options however can be configured individually
678 by an operator, using the Glassfish Admin Console.
679
680 The Console can be accessed at the network port 4848 when Glassfish is
681 running, by pointing a browser at
682
683 ``http://[your host name]:4848/``
684
685 and logging in as ``admin``. The initial password is ``adminadmin``. It
686 is of course strongly recommended to log in and change it first thing
687 after you run the Installer.
688
689 The sections below describe all the configuration settings that would
690 need to be done through the GUI in order to replicate the configuration
691 file produced by the Installer. This information is provided for the
692 benefit of an advanced user who may want to experiment with individual
693 options. Or to attempt to install DVN on a platform not supported by our
694 installer; although we wish sincerely that nobody is driven to such
695 desperate measures ever.
696
697 .. _jvm-options:
698
699 JVM options
700 -----------------------
701
702 Under Application Server->JVM Settings->JVM Options:
703
704 If you are installing Glassfish in a production environment, follow
705 these steps:
706
707 #. | Delete the following options: -Dsun.rmi.dgc.server.gcInterval=3600000
708 | -Dsun.rmi.dgc.client.gcInterval=3600000
709 #. | Add the following options:
710 | -XX:MaxPermSize=192m
711 | -XX:+AggressiveHeap
712 | -Xss128l
713 | -XX:+DisableExplicitGC
714 | -Dcom.sun.enterprise.ss.ASQuickStartup=false
715 #. | To install on a multi-processor machine, add the following:
716 | ``-XX:+UseParallelOldGC``
717 #. | To enable the optional HANDLE.NET installation and provide access to
718 | study ID registration, add the following (see the "Handles System"
719 | section in the "Optional Components" for
720 | details):
721 | ``-Ddvn.handle.baseUrl=<-Dataverse Network host URL>/dvn/study?globalId=hdl:``
722 | ``-Ddvn.handle.auth=<authority>``
723 | ``-Ddvn.handle.admcredfile=/hs/svr_1/admpriv.bin``
724 #. | To enable the optional Google Analytics option on the Network Options
725 | page and provide access to site usage reports, add the following (see
726 | the "Google Analytics" section in the "Optional Components" for
727 | details):
728 | ``-Ddvn.googleanalytics.key=<googleAnalyticsTrackingCode>``
729 #. | Configure the following option only if you run multiple instances
730 | of the GlassFish server for load balancing. This option controls
731 | which GlassFish instance runs scheduled jobs, such as harvest or
732 | export.
733 | For the server instance that will run scheduled jobs, include the
734 | following JVM option:
735 | ``-Ddvn.timerServer=true``
736 | For all other server instances, include this JVM option:
737 | ``-Ddvn.timerServer=false``
738 | If you are installing Glassfish in either a production or development
739 | environment, follow these steps:
740
741 - | Change the following options’ settings:
742 | Change ``-client`` to ``-server``.
743 | Change ``-Xmx512m`` to whatever size you can allot for the maximum
744 | Java heap space.
745 | Set `` –Xms512m`` to the same value to which you set ``–Xmx512m``.
746 - | To configure permanent file storage (data and documentation files
747 | uploaded to studies) set the following:
748 | ``-Dvdc.study.file.dir=${com.sun.aas.instanceRoot}/config/files/studies``
749 - | To configure the temporary location used in file uploads add the
750 | following:
751 | ``-Dvdc.temp.file.dir=${com.sun.aas.instanceRoot}/config/files/temp``
752 - | To configure export and import logs (harvesting and importing),
753 | add the following:
754 | -Dvdc.export.log.dir=${com.sun.aas.instanceRoot}/logs/export
755 | -Dvdc.import.log.dir=${com.sun.aas.instanceRoot}/logs/import
756 - | Add the following:
757 | -Djhove.conf.dir=${com.sun.aas.instanceRoot}/config
758 | -Ddvn.inetAddress=<host or fully qualified domain name of server
759 | on which Dataverse Network runs>
760 | -Ddvn.networkData.libPath=${com.sun.aas.instanceRoot}/applications/j2ee-
761 | apps/DVN-EAR
762 - | To manage calls to RServe and the R host (analysis and file upload), add
763 | the following:
764 | ``-Dvdc.dsb.host=<RServe server hostname>``
765 | ``-Dvdc.dsb.rserve.user=<account>``
766 | ``-Dvdc.dsb.rserve.pwrd=<password>``
767 | ``-Dvdc.dsb.rserve.port=<port number>``
768
769
770 | For Installing R, see:
771 | :ref:`R and R-Serve <r-and-rserve>`
772 | for information about configuring these values in the ``Rserv.conf``
773 | file.
774 | These settings must be configured for subsetting and analysis to
775 | work.
776 - | To configure search index files set the following:
777 | ``-Ddvn.index.location=${com.sun.aas.instanceRoot}/config``
778 - | To use the optional customized error logging and add more information
779 | to your log files, set the following:
780 | ``-Djava.util.logging.config.file= ${com.sun.aas.instanceRoot} /config/logging.properties``
781 | **Note**: To customize the logging, edit the ``logging.properties`` file
782 - | The default size limit for file downloads is 100MB. To override this
783 | default add the following JVM option:
784 | ``-Ddvn.batchdownload.limit=<max download bytes>``
785
786 EJB Container
787 -----------------------------
788
789 Under Configuration->EJB Container->EJB Timer Service:
790
791 #. | Set the Timer Datasource to the following:
792 | ``jdbc/VDCNetDS``
793 #. | Save the configuration.
794
795 HTTP Service
796 -----------------------------
797
798 The HTTP Service configuration settings described in this section are suggested defaults. These settings are very important. There are no right values to define; the values depend on the specifics of your web traffic, how many requests you get, how long they take to process on average, and your hardware. For detailed the
799 | Sun Microsystems Documentation web site at the following URL:
800
801 `http://docs.sun.com/ <http://docs.sun.com/>`_
802
803
804 | **Note**: If your server becomes so busy that it drops connections,
805 | adjust the Thread Counts to improve performance.
806
807 #. Under Configuration->HTTP Service->HTTP
808 Listeners->\ ``http-listener-1``:
809
810 - Listener Port: 80
811 - Acceptor Threads: The number of CPUs (cores) on your server
812
813 #. Under Configuration->HTTP Service, in the RequestProcessing tab:
814
815 - Thread Count: Four times the number of CPUs (cores) on your server
816 - Initial Thread Count: The number of CPUs (cores)
817
818 #. Under Configuration->HTTP Service->Virtual Servers->server: add new property ``allowLinking`` with the value ``true``.
819
820 #. | Under Configuration->HTTP Service, configure Access Logging:
821
822 | format=%client.name% %auth-user-name% %datetime% %request% %status%
823 | %response.length%
824 | rotation-enabled=true
825 | rotation-interval-in-minutes=15
826 | rotation-policy=time
827 | rotation-suffix=yyyy-MM-dd
828
829 JavaMail Session
830 ------------------------------------
831
832 Under Resources->JavaMail Sessions\ ``->mail/notifyMailSession:``
833
834 - | Mail Host: ``<your mail server>``
835 | **Note**: The Project recommends that you install a mail server on the same machine as GlassFish and use ``localhost`` for this entry. Since email notification is used for workflow events such as creating a dataverse or study, these functions may not work properly if a valid mail server is not configured.
836 - Default User: ``dataversenotify``
837 This does not need to be a real mail account.
838 - Default Return Address: ``do-not-reply@<your mail server>``
839
840 JDBC Resources
841 ------------------------------------
842
843 **Under Resources->JDBC->Connection Pools:**
844
845
846 | Add a new Connection Pool entry:
847
848 - entryName: ``dvnDbPool``
849 - Resource Type: ``javax.sql.DataSource``
850 - Database Vendor: ``PostgreSQL``
851 - DataSource ClassName: ``org.postgresql.ds.PGPoolingDataSource``
852 - Additional Properties:
853
854 - ConnectionAttributes: ``;create=true``
855 - User: ``dvnApp``
856 - PortNumber: ``5432`` (Port 5432 is the PostgreSQL default port.)
857 - Password: ``<Dataverse Network application database password>``
858 - DatabaseName: ``<your database name>``
859 - ServerName: ``<your database host>``
860 - JDBC30DataSource: ``true``
861
862 |
863
864 **Under Resources->JDBC->JDBC Resources:**
865
866 | Add a new JDBC Resources entry:
867
868 - JNDI Name: ``jdbc/VDCNetDS``
869 - Pool Name: ``dvnDbPool``
870
871 JMS Resources
872 -----------------------------------------
873
874 Under Resources->JMS Resources:
875
876 #. Add a new Connection Factory for the DSB Queue:
877
878 - JNDI Name: ``jms/DSBQueueConnectionFactory``
879 - Resource Type: ``javax.jms.QueueConnectionFactory``
880
881 #. Add a new Connection Factory for the Index Message:
882
883 - JNDI Name: ``jms/IndexMessageFactory``
884 - Resource Type: ``javax.jms.QueueConnectionFactory``
885
886 #. Add a new Destination Resource for the DSB Queue:
887
888 - JNDI Name: ``jms/DSBIngest``
889 - Physical Destination Name: ``DSBIngest``
890 - Resource Type: ``javax.jms.Queue``
891
892 #. Add a new Destination Resource for the Index Message:
893
894 - JNDI Name: ``jms/IndexMessage``
895 - Physical Destination Name: ``IndexMessage``
896 - Resource Type: ``javax.jms.Queue``
897
898 .. _postgresql-setup:
899
900 PostgreSQL setup
901 =======================
902
903 The following actions are normally performed by the automated installer
904 script. These steps are explained here for reference, and/or in case
905 your need to perform them manually:
906
907 1. Start as root, then change to user postgres:
908
909 ``su postgres``
910
911 Create DVN database usert (role):
912
913 ``createuser -SrdPE [DB_USERNAME]``
914
915 (you will be prompted to choose a user password).
916
917 Create DVN database:
918
919 ``createdb [DB_NAME] --owner=[DB_USERNAME]``
920
921 ``[DB_NAME]`` and ``[USER_NAME]`` are the names you choose for your DVN database and database user. These, together with the password you have assigned, will be used in the Glassfish configuration so that the application can talk to the database.
922
923 2. Before Glassfish can be configured for the DVN app, the Postgres driver needs to be installed in the <GLASSFISH ROOT>/lib directory. We supply a version of the driver known to work with the DVN in the dvninstall/pgdriver directory of the Installer bundle. (This is the :ref:`"What does the Installer do?" <what-does-the-intstaller-do>` section of this appendix) An example of the installed location of the driver:
924
925 ``/usr/local/glassfish/lib/postgresql-8.3-603.jdbc4.jar``
926
927 3. Finally, after the DVN application is deployed under Glassfish for the first time, the database needs to be populated with the initial content:
928
929 ``su postgres``
930 ``psql -d [DB_NAME] -f referenceData.sql``
931
932 The file referenceData.sql is provided as part of the installer zip package.
933
934 RedHat startup file for glassfish, example
935 ====================================================
936
937 Below is an example of a glassfish startup file that you may want to
938 install on your RedHat (or similar) system to have glassfish start
939 automatically on boot.
940
941 | Install the file as ``/etc/init.d/glassfish``, then run ``chkconfig glassfish on``
942
943 Note that the extra configuration steps before the domain start line,
944 for increasing the file limit and allowing "memory overcommit". These
945 are useful settings to have on a production server.
946
947 | You may of course add extra custom configuration specific to your
948 setup.
949
950 .. code-block:: guess
951
952 #! /bin/sh
953 # chkconfig: 2345 99 01
954 # description: GlassFish App Server
955 set -e
956 ASADMIN=/usr/local/glassfish/bin/asadmin
957 case "$1" in
958 start)
959 echo -n "Starting GlassFish server: glassfish"
960 # Increase file descriptor limit:
961 ulimit -n 32768
962 # Allow "memory overcommit":
963 # (basically, this allows to run exec() calls from inside the
964 # app, without the Unix fork() call physically hogging 2X
965 # the amount of memory glassfish is already using)
966 echo 1 > /proc/sys/vm/overcommit_memory
967 $ASADMIN start-domain domain1 echo "."
968 ;;
969 stop)
970 echo -n "Stopping GlassFish server: glassfish"
971 $ASADMIN stop-domain domain1
972 echo "."
973 ;;
974 *)
975 echo "Usage: /etc/init.d/glassfish {start|stop}"
976
977 exit 1
978 esac
979 exit 0
980
981
982 Enabling secure remote access to Asadmin
983 ========================================
984
985 As was mentioned in the Glassfish section of the manual, in version
986 3.1.2 admin interface (asadmin) is configured to be accessible on the
987 localhost interface only. If you need to be able to access the admin
988 console remotely, you will have to enable secure access to it. (It will
989 be accessible over https only, at ``https://<YOUR HOST>:4848``; connections
990 to ``http://<YOUR HOST>:4848`` will be automatically redirected to the https
991 interface)
992
993 The following must be done as root:
994
995 #. First you need to configure the admin password:
996
997 ``<GF LOCATION>/glassfish3/bin/asadmin change-admin-password``
998
999 (since you didn't create one when you were installing Glassfish, leave the "current password" blank, i.e., hit ENTER)
1000
1001 #. Enable the secure access:
1002
1003 ``<GF LOCATION>/glassfish3/bin/asadmin enable-secure-admin``
1004
1005 (Note that you will need to restart Glassfish after step 2. above)
1006
1007 .. _using-lockss-with-dvn:
1008
1009 Using LOCKSS with DVN
1010 =======================================
1011
1012 DVN holdings can be crawled by LOCKSS servers (`www.lockss.org <http://www.lockss.org>`__). It is made possible by the special plugin developed and maintained by the DVN project, which a LOCKSS daemon utilizes to crawl and access materials served by a Dataverse network.
1013
1014 The current stable version of the plugin is available at the following location:
1015
1016 `http://lockss.hmdc.harvard.edu/lockss/plugin/DVNOAIPlugin.jar <http://lockss.hmdc.harvard.edu/lockss/plugin/DVNOAIPlugin.jar>`__
1017
1018
1019 As of January 2013 and DVN version 3.3, the plugin is compatible with the LOCKSS daemon version 1.55. The plugin sources can be found in the main DVN source tree in `https://dvn.svn.sourceforge.net/svnroot/dvn/dvn-app/trunk/src/DVN-lockss <https://dvn.svn.sourceforge.net/svnroot/dvn/dvn-app/trunk/src/DVN-lockss>`_ (please note that the DVN project is currently **in the process of moving to gitHub!** The preserved copy of the 3.3 source will be left at the URL above, together with the information on the current location of the source repository).
1020
1021 In order to crawl a DVN, the following steps need to be performed:
1022
1023 #. Point your LOCKSS daemon to the plugin repository above. (Refer to the LOCKSS documentation for details);
1024 #. Create a LOCKSS Archival Unit for your target DVN:
1025
1026 In the LOCKSS Admin Console, go to **Journal Configuration** -> **Manual Add/Edit** and click on **Add Archival Unit**.
1027
1028 On the next form, select **DVNOAI** in the pull down menu under **Choose a publisher plugin** and click **Continue**.
1029
1030 Next configure the parameters that define your DVN Archival Unit. LOCKSS daemon can be configured to crawl either the entire holdings of a DVN (no OAI set specified), or a select Dataverse.
1031
1032 Note that LOCKSS crawling must be authorized on the DVN side. Refer to
1033 the :ref:`"Edit LOCKSS Settings" <edit-lockss-harvest-settings>`
1034 section of the DVN Network Administrator Guide for the instructions on
1035 enabling LOCKSS crawling on the network level, and/or to the
1036 :ref:`Enabling LOCKSS access to the Dataverse <enabling-lockss-access-to-the-dataverse>`
1037 of the Dataverse Administration Guide. Once you allow LOCKSS crawling of
1038 your Dataverse(s), you will need to enter the URL of the "LOCKSS
1039 Manifest" page provided by the DVN in the configuration above. For the
1040 network-wide archival unit this URL will be
1041 ``http``\ ``://<YOUR SERVER>/dvn/faces/ManifestPage.xhtml``; for an
1042 individual dataverse it is
1043 ``http``\ ``://<YOUR SERVER>/dvn/dv/<DV ALIAS>/faces/ManifestPage.xhtml.``
1044
1045 | The URL of the DVN OAI server is ``http``\ ``://<YOUR DVN HOST>/dvn/OAIHandler``.
1046
1047 Read Only Mode
1048 ===================
1049
1050 A Read Only Mode has been established in DVN to allow the application to remain available while deploying new versions or patches. Users will be able to view data and metadata, but will not be able to add or edit anything. Currently there is no way to switch to Read Only Mode through the application.
1051 In order to change the application mode you must apply the following queries through ``psql`` or ``pgAdmin``:
1052
1053 To set to Read Only Mode:
1054
1055 | ``BEGIN;``
1056 | ``SET TRANSACTION READ WRITE;``
1057 | ``-- Note database and user strings may have to be modified for your particular installation;``
1058 | ``-- You may also customize the status notice which will appear on all pages of the application;``
1059 | ``update vdcnetwork set statusnotice = "This network is currently in Read Only state. No saving of data will be allowed.";``
1060 | ``ALTER DATABASE "dvnDb" set default_transaction_read_only=on;``
1061 | ``Alter user "dvnApp" set default_transaction_read_only=on;``
1062 | ``update vdcnetwork set statusnotice = "";``
1063 | ``END;``
1064
1065 To return to regular service:
1066
1067 | ``BEGIN;``
1068 | ``SET TRANSACTION READ WRITE;``
1069 | ``-- Note database and user strings may have to be modified for your particular installation;``
1070 | ``ALTER DATABASE "dvnDb" set default_transaction_read_only=off;``
1071 | ``Alter user "dvnApp" set default_transaction_read_only=off;``
1072 | ``update vdcnetwork set statusnotice = "";``
1073 | ``END;``
1074
1075 Backup and Restore
1076 ================================
1077
1078 **Backup**
1079
1080 | The PostgreSQL database and study files (contained within the Glassfish directory by default but this is :ref:`configurable via JVM options <jvm-options>`) are the most critical components to back up. The use of standard PostgreSQL tools (i.e. pg\_dump) is recommended.
1081
1082 Glassfish configuration files (i.e. domain.xml, robots.txt) and local
1083 customizations (i.e. images in the docroot) should be backed up as well.
1084 In practice, it is best to simply back up the entire Glassfish directory
1085 as other files such as logs may be of interest.
1086
1087 | **Restore**
1088
1089 Restoring DVN consists of restoring the PostgreSQL database and the
1090 Glassfish directory.