| 3 | |
| 4 | == Viewing the images == |
| 5 | After it has been made clear which books are to be typed, the scans of |
| 6 | said book are being looked at. The following issues are important: |
| 7 | |
| 8 | === Quality of the scans === |
| 9 | This issue is not necessarily part of the workflow as the library's |
| 10 | digigroup normally takes care of that (the major part of books are |
| 11 | scanned at the MPIWG). However, some scans might be faulty |
| 12 | nevertheless. It also happens that in the process of uploading the |
| 13 | files to the ftp-Server some jpg-files are broken (or in the process |
| 14 | of being downloaded by Formax). This is about ten pages in a work |
| 15 | order of 5.000 pages. This seems to be a problem of the jpg format. |
| 16 | |
| 17 | |
| 18 | === Structural organization of the book === |
| 19 | The book in question has to be examined according to the document DE |
| 20 | Specs (latest version, which can be found on the versioned file |
| 21 | folder). The DE Specs document should cover most of the cases that |
| 22 | might occur in a book. |
| 23 | |
| 24 | The instructions are kept simple and cover even the very basic issues |
| 25 | how a book should be typed. It is recommended that one is familiar |
| 26 | with the contents of the DE Specs. |
| 27 | |
| 28 | It is not necessary that each page of a book is examined closely. |
| 29 | After the first few pages of the main matter it should be obvious how |
| 30 | the book works. The rest can be viewed via the thumbnails (if one |
| 31 | views the images using digilib -- which is recommended due to the |
| 32 | documentation in the wiki (see below). |
| 33 | |
| 34 | Looking at the thumbnails is mostly enough to spot difficult parts of |
| 35 | a book. One can easily tell pages with pure text from pages containing |
| 36 | other structures, e. g. tables, indexes or images. |
| 37 | |
| 38 | Difficulties in pages with plain text can basically only arise on the |
| 39 | level of unknown or unreadable characters. |
| 40 | |
| 41 | |
| 42 | === Printing, special characters === |
| 43 | Spotting difficult characters is of course an issue where the page has |
| 44 | to be examined very closely. The same goes for damages in |
| 45 | the book. They might occur here and there. In the DE Specs there is |
| 46 | a flow chart which checks the eventualities what to do |
| 47 | with an unknown character on the side of the data entry firm (check if |
| 48 | in Unicode, check if in unknown characters). |
| 49 | |
| 50 | == Documenting in the wiki == |
| 51 | |
| 52 | The wiki is the central place for storing |
| 53 | information about the texts. For each work order sent to the data |
| 54 | entry firms, there should be an information page. Each of these |
| 55 | overview pages contains links to the images of every book in the work |
| 56 | order. These pages are based on a template which documents the whole |
| 57 | work flow. It contains the following elements: |
| 58 | - links to the image scans and to the overview page of the work order |
| 59 | - information which version of the DE Specs was used and if special |
| 60 | instructions were given |
| 61 | - when the images were sent and when the text was returned |
| 62 | - information about expected difficulties |
| 63 | - Questions by the data entry firm and answers given |
| 64 | - Analysis of the results |
| 65 | - Post processing |
| 66 | |
| 67 | === Expected difficulties === |
| 68 | During the examination of the images (see above), difficult structures |
| 69 | should be noted. This may apply to things not covered in the DE Specs |
| 70 | and which have to be dealt with in the future or with things that have |
| 71 | gone wrong a few times in the past. Links to the difficulties should |
| 72 | be provided by using the features of the digilib tool. |
| 73 | |
| 74 | The digilib tool is sometimes irresponsive, lacks features |
| 75 | (buttons) or is not available at all (heavy traffic?). A possible |
| 76 | solution is to download the images from the server to view them |
| 77 | directly. However, marking and linking things in the wiki is not |
| 78 | possible then. |
| 79 | |
| 80 | === Analysis of the results === |
| 81 | At some point, the data entry will be finished and a respective *.txt |
| 82 | file will have returned. Accompanying that file might be an updated |
| 83 | version of the unknown characters list. Said list should be examined |
| 84 | as to which characters were added. |
| 85 | |
| 86 | One obvious part of this analysis is of course to check how the data |
| 87 | entry firm fared with the expected difficulties. This is done by |
| 88 | looking up the passages in the received text file and judging how they |
| 89 | were typed. In any case, the results should be written down in the |
| 90 | wiki. Lateron, a decision has to be made what to do if there is an |
| 91 | error in the typing. The possibilities are: |
| 92 | - correct the mistake silently |
| 93 | - ask the data entry firm to redo the part |
| 94 | |
| 95 | /Add something about the Special Instructions/ |
| 96 | |