Version 15 (modified by 16 years ago) (diff) | ,
---|
The first few pages of Diversae "Conimbricenses In Universam dialecticam" (1606), Benedetti, Giovanni Battista de "Diversarvm specvlationvm mathematicarum, et physicarum liber" (1585) and Euclid "Elementorum Libri XV" (1607) were digitized and sent back for evaluation. In general, the results are very good.
Unfortunately, the work sample does not contain a page of the Conimbricenses where the Special Instructions apply.
PDF versions of the work samples are attached. In these PDF versions, the font is Helvetica 12pt (10pt for Benedetti), blank lines have been inserted before <pb> tags, and < > { } _ are in bold face.
Offsets ECHO - page numbers in the book: Diversae 2, Benedetti 12
What does work
- Letters with swashes are recognized, except for this Quod which was transcribed as Luod. Character recognition is surprisingly high, e. g. Conimbricenses, p. 3
- List of unknown characters is used (two characters so far), unreadable text is marked up accurately.
- Multiline headings are recognized, possibly because of punctuation
- Both methods of marking up italics in headings is used:
<h it>TRACTATVS QVI IN HOC volumine continentur.</h>
<h>_Theoremata Arithmetica._</h>
- Library stamps are either typed:
<h>MAX-PLANCK-INB<?>TITUT $<?>UR WISS<?>ENSCN<?>AF<?>T@@@@CHICHTE</h> <h>Bibliothek</h>
or coded as <fig>:
<h><red>E SOCIETATE IESV,</r></h> <h>_IN VNIVERSAMDIA_ _Iecticam Ari$totelis Stagiritæ_</h> <fig> <fig>
- Parentheses work well, only one example with spaces within parentheses (Benedetti, p. 9). Original has spaces.
What does not work
- The <red> tag is always closed by the </r> tag.
- Some ornamental figures are not tagged, e. g. this one.
- Various mistypings occur frequently:
- bumanitate rather than humanitate
- œ rather than æ
- Number 10 becomes
<sc>IO</sc>
in Euclid, p. 13. A date on the same page is recognised correctly. - Greek Ligatures
- Letter variation of τ was sometimes recognized and sometimes typed as T (as in
άγεωμέΤρητ@
(Euclid, p. 9)) or
- Letter variation of τ was sometimes recognized and sometimes typed as T (as in
Adjustments to be made
- In the DESpecs 1.1.2 it is not said that the <mg{l|r}> tag may contain the it argument. Thus, the _ _ markup is used consistently. The Specs should allow this.
Attachments (5)
- Diversae_part_001.pdf (374.1 KB) - added by 16 years ago.
- Benedetti, Giovanni_part_001.pdf (228.6 KB) - added by 16 years ago.
- Euclid_part_001.pdf (198.7 KB) - added by 16 years ago.
- Diversae_part_002.pdf (349.7 KB) - added by 16 years ago.
-
Unknown Characters List.pdf (201.9 KB) - added by 16 years ago.
New version, characters <001>-<075>
Download all attachments as: .zip