wiki:Preparing the documents for keying

Viewing the images

After it has been made clear which books are to be typed, the scans of said book are being looked at. The following issues are important:

Quality of the scans

This issue is not necessarily part of the workflow as the library's digigroup normally takes care of that (the major part of books are scanned at the MPIWG). However, some scans might be faulty nevertheless. It also happens that in the process of uploading the files to the ftp-Server some jpg-files are broken (or in the process of being downloaded by the data entry firm). This is about ten pages in a work order of 5.000 pages. This seems to be a problem of the jpg format.

Structural organization of the book

The book in question has to be examined according to the document DE Specs (latest version, which can be found on the versioned file folder). The DE Specs document should cover most of the cases that might occur in a book.

The instructions are kept simple and cover even the very basic issues how a book should be typed. It is recommended that one is familiar with the contents of the DE Specs.

It is not necessary that each page of a book is examined closely. After the first few pages of the main matter it should be obvious how the book works. The rest can be viewed via the thumbnails (if one views the images using digilib -- which is recommended due to the documentation in the wiki (see below)).

Looking at the thumbnails is mostly enough to spot difficult parts of a book. One can easily tell pages with pure text from pages containing other structures, e. g. tables, indexes or images.

Difficulties in pages with plain text can basically only arise on the level of unknown or unreadable characters.

Printing, special characters

Spotting difficult characters is of course an issue where the page has to be examined very closely. The same goes for damages in the book. They might occur here and there. In the DE Specs there is a flow chart which checks the eventualities what to do with an unknown character on the side of the data entry firm (check if in Unicode, check if in unknown characters).

Documenting in the wiki

The wiki is the central place for storing information about the texts. For each work order sent to the data entry firms, there should be an information page. Each of these overview pages contains links to the images of every book in the work order. These pages are based on a template which documents the whole work flow. It contains the following elements:

  • links to the image scans and to the overview page of the work order
  • information which version of the DE Specs was used and if special instructions were given
  • when the images were sent and when the text was returned
  • information about expected difficulties
  • Questions by the data entry firm and answers given
  • Analysis of the results
  • Post processing

Expected difficulties

During the examination of the images (see above), difficult structures should be noted. This may apply to things not covered in the DE Specs and which have to be dealt with in the future or with things that have gone wrong a few times in the past. Links to the difficulties should be provided by using the features of the digilib tool.

The digilib tool is sometimes irresponsive, lacks features (buttons) or is not available at all (heavy traffic?). A possible solution is to download the images from the server to view them directly. However, marking and linking things in the wiki is not possible then.

Special Instructions

Sometimes, a book in a work order contains special features that are not covered by the DE Specs. In this case, Special Instructions are written which cover that specific case. They are added as a pdf-document to the work order.

Some examples

  • First SI: Text flows in WO1_Conimbricenses_1606. Not in the current version of DE Specs, but likely to be included in the future, as they occur frequently.
  • Specifications for typing tables were added gradually. Being at first a separate instruction for WO1_Ghetaldi_1603, it has been included in the main document.
  • In the case of WO7_Vitruvius_1511 it was important that information about handwritten annotations and emendations are preserved. The basic <hd>-Tag from the DE Specs was not sufficient. In cases like these, it is very likely that the Special Instrucions will be inserted into the next version of the DE Specs.

In other cases, however, Special Instructions are used merely to make sure that a special case in one specific book is coded correctly. They can in that case be a reaction to a question by the data entry firm, if a simple answer by mail is not sufficient (or is it rather so that we used that in the beginning and shifted then to using only mail?)

The overview pages of the work orders as well as the pages for each book state if the book was sent with Special Instructions.

Analysis of the results

At some point, the data entry will be finished and a respective *.txt file will have returned. Accompanying that file might be an updated version of the unknown characters list. Said list should be examined as to which characters were added.

One obvious part of this analysis is of course to check how the data entry firm fared with the expected difficulties. This is done by looking up the passages in the received text file and judging how they were typed. In any case, the results should be written down in the wiki. Lateron, a decision has to be made what to do if there is an error in the typing. The possibilities are:

  • correct the mistake silently
  • ask the data entry firm to redo the part
Last modified 13 years ago Last modified on Dec 1, 2010, 9:41:24 AM