wiki:First evaluation

Version 24 (modified by Klaus Thoden, 16 years ago) (diff)

--

The first fifty pages of Diversae "Conimbricenses In Universam dialecticam" (1606), Benedetti, Giovanni Battista de "Diversarvm specvlationvm mathematicarum, et physicarum liber" (1585) and Euclid "Elementorum Libri XV" (1607) were digitized and sent back for evaluation. In general, the results are very good.

Unfortunately, the work sample does not contain a page of the Conimbricenses where the Special Instructions apply.

PDF versions of the work samples are attached. In these PDF versions, the font is Helvetica 12pt (10pt for Benedetti), blank lines have been inserted before <pb> tags, and < > { } _ are in bold face.

Offsets ECHO - page numbers in the book: Diversae 2, Benedetti 12

What does work

  • Letters with swashes are recognized, except for this Quod which was transcribed as Luod. Character recognition is surprisingly high, e. g. Conimbricenses, p. 3
  • List of unknown characters is used (two characters so far), unreadable text is marked up accurately.
  • Multiline headings are recognized, possibly because of punctuation
  • Both methods of marking up italics in headings is used:
    <h it>TRACTATVS QVI IN HOC
    volumine continentur.</h>
    

(Benedetti, p. 6)

<h>_Theoremata Arithmetica._</h>

(Benedetti, p. 13)

  • Library stamps are either typed:
    <h>MAX-PLANCK-INB<?>TITUT
    $<?>UR WISS<?>ENSCN<?>AF<?>T@@@@CHICHTE</h>
    <h>Bibliothek</h>
    

or coded as <fig>:

<h><red>E SOCIETATE IESV,</r></h>
<h>_IN VNIVERSAMDIA_
_Iecticam Ari$totelis Stagiritæ_</h>
<fig>
<fig>
  • Parentheses work well, only one example with spaces within parentheses (Benedetti, p. 9). Original has spaces.
  • The List of unknown characters works good and is obviously frequently updated. Unknown character <010>, however, is represented by a wrong image. The unknown character in question is this one. Unknown character <006> and <011> do not occur in the work samples, characters <012> and <014> occur in the text, but are not on the list (yet?).

Small problem with the list: there is only one list for all documents. Was this intended?

What does not work

  • The <red> tag is always closed by the </r> tag.
  • Some ornamental figures are not tagged, e. g. this one.
  • Various mistypings:
    • $in rather than $m
    • f rather than $ (∫) (frequently)
    • ÿ rather than {ij} (italics only, frequently)
    • b rather than h (italics only, frequently)
    • œ rather than æ (italics only)
  • Number 10 becomes <sc>IO</sc> in Euclid, p. 13. A date on the same page is recognised correctly.
  • Greek Ligatures
    • Letter variation of τ was recognized, but τ (in the same word!) was typed as T (as in άγεωμέΤρητ@ (Euclid, p. 9)) and correctly in the next word
  • This Vale. has been taken for a catchword.
  • Tartaglia (1565), p. 16: first two figures are coded as one figure, the two at the bottom are separate. Caption works.
  • What happens with spaces in the text like this one? Are they meaningful?

Adjustments to be made

  • In the DESpecs 1.1.2 it is not said that the <mg{l|r}> tag may contain the it argument. Thus, the _ _ markup is used consistently. The Specs should allow this.

Attachments (5)

Download all attachments as: .zip