Version 4 (modified by 13 years ago) (diff) | ,
---|
The workflow is going to be adapted to allow the use of OCRed text as input. The OCR engine is going to be OCRopus.
Tutorial video and other videos
The documents of the previous workflows were assessed in terms of how well they might perform being OCRed.
- Easy
- Bernstein, 1897 (Fraktur)
- Berzelius 1819
- Ampère 1844
- Medium
- Vitruvius 1511
- Cataneo 1600
- Aristoteles 1547 (italics)
- Archimedes 1565 (many pictures (problematic?))
- Cataneo 1572 (bad printing)
- Viviani 1659
- Bianconi 1746
- Zanotti 1752
- Bion 1765 (blackletter)
- Vitruvius 1800 (clear, but weak printing, can see printing of the other side)
- Gallaccini 1767 (small font, but quite clear)
- Angeli 1668 (contains italics, otherwise very clear)
- Trigault 1639
- Bernoulli 1738
- Hard
- Vitruvius 1544
- Vitruvius 1757 (Mixed blackletter and antiqua)
- Zonca 1656 (bad printing)
- Bacon 1670 (contains italics, bad printing)
- Clavius 1606
- Barrow 1674 (bad printing, italics and Greek)
- Gravesande 1721
- Vitruvius 1618 (Thesaurus has columns)
- Mersenne 1635 (Microfilm)
- Aristoteles 1548 (contains Greek)
- Vitruvius 1556 (extremly small font)
- Aristoteles 1585 (bad printing)
- Specklin 1599 (bad printing, blackletter)
- Biancani 1635 (very small font)
- Vitruvius 1567 (very small font, mixed italics and upright)
- Archimedes 1565 (mixed italics and upright)
Command overview
The following commands (taken from above video) allow the recognition of English text:
ocropus-binarize 035.jpg
ocropus-pseg book/????.png
ocropus-lattices -m OCRopus/ocropy/2m2-reject.cmodel book/0001/??????.png
ocropus-align -l OCRopus/ocropy/data/default.fst book/0001/??????.fst
ocropus-hocr book/
Attachments (1)
-
result.html (4.1 KB) - added by 13 years ago.
First OCR of Bacon, standard settings
Download all attachments as: .zip