Changes between Version 1 and Version 2 of Preparing the documents for keying


Ignore:
Timestamp:
Aug 20, 2009, 4:38:32 PM (15 years ago)
Author:
Klaus Thoden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Preparing the documents for keying

    v1 v2  
    11= Workflow =
    22
     3
     4== Viewing the images ==
     5After it has been made clear which books are to be typed, the scans of
     6said book are being looked at. The following issues are important:
     7
     8=== Quality of the scans ===
     9This issue is not necessarily part of the workflow as the library's
     10digigroup normally takes care of that (the major part of books are
     11scanned at the MPIWG). However, some scans might be faulty
     12nevertheless. It also happens that in the process of uploading the
     13files to the ftp-Server some jpg-files are broken (or in the process
     14of being downloaded by Formax). This is about ten pages in a work
     15order of 5.000 pages. This seems to be a problem of the jpg format.
     16
     17
     18=== Structural organization of the book ===
     19The book in question has to be examined according to the document DE
     20Specs (latest version, which can be found on the versioned file
     21folder). The DE Specs document should cover most of the cases that
     22might occur in a book.
     23
     24The instructions are kept simple and cover even the very basic issues
     25how a book should be typed. It is recommended that one is familiar
     26with the contents of the DE Specs.
     27
     28It is not necessary that each page of a book is examined closely.
     29After the first few pages of the main matter it should be obvious how
     30the book works. The rest can be viewed via the thumbnails (if one
     31views the images using digilib -- which is recommended due to the
     32documentation in the wiki (see below).
     33
     34Looking at the thumbnails is mostly enough to spot difficult parts of
     35a book. One can easily tell pages with pure text from pages containing
     36other structures, e. g. tables, indexes or images.
     37
     38Difficulties in pages with plain text can basically only arise on the
     39level of unknown or unreadable characters.
     40
     41
     42=== Printing, special characters ===
     43Spotting difficult characters is of course an issue where the page has
     44to be examined very closely. The same goes for damages in
     45the book. They might occur here and there. In the DE Specs there is
     46a flow chart which checks the eventualities what to do
     47with an unknown character on the side of the data entry firm (check if
     48in Unicode, check if in unknown characters).
     49
     50== Documenting in the wiki ==
     51
     52The wiki is the central place for storing
     53information about the texts. For each work order sent to the data
     54entry firms, there should be an information page. Each of these
     55overview pages contains links to the images of every book in the work
     56order. These pages are based on a template which documents the whole
     57work flow. It contains the following elements:
     58 - links to the image scans and to the overview page of the work order
     59 - information which version of the DE Specs was used and if special
     60   instructions were given
     61 - when the images were sent and when the text was returned
     62 - information about expected difficulties
     63 - Questions by the data entry firm and answers given
     64 - Analysis of the results
     65 - Post processing
     66
     67=== Expected difficulties ===
     68During the examination of the images (see above), difficult structures
     69should be noted. This may apply to things not covered in the DE Specs
     70and which have to be dealt with in the future or with things that have
     71gone wrong a few times in the past. Links to the difficulties should
     72be provided by using the features of the digilib tool.
     73
     74The digilib tool is sometimes irresponsive, lacks features
     75(buttons) or is not available at all (heavy traffic?). A possible
     76solution is to download the images from the server to view them
     77directly. However, marking and linking things in the wiki is not
     78possible then.
     79
     80=== Analysis of the results ===
     81At some point, the data entry will be finished and a respective *.txt
     82file will have returned. Accompanying that file might be an updated
     83version of the unknown characters list. Said list should be examined
     84as to which characters were added.
     85
     86One obvious part of this analysis is of course to check how the data
     87entry firm fared with the expected difficulties. This is done by
     88looking up the passages in the received text file and judging how they
     89were typed. In any case, the results should be written down in the
     90wiki. Lateron, a decision has to be made what to do if there is an
     91error in the typing. The possibilities are:
     92 - correct the mistake silently
     93 - ask the data entry firm to redo the part
     94
     95/Add something about the Special Instructions/
     96