The workflow is going to be adapted to allow the use of OCRed text as input. The OCR engine is going to be [http://code.google.com/p/ocropus/ OCRopus]. [http://www.youtube.com/watch?v=pDYq0MlD8RQ Tutorial video] and [http://www.youtube.com/user/tmbdev#p/c/0B3367BC0E5CAF8D other videos] The documents of the previous [wiki:OverviewWorkOrders2008 workflows] were assessed in terms of how well they might perform being OCRed. * Easy * Bernstein, 1897 (Fraktur) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/fr/Berzelius_1819_WCWY69V2.xml Berzelius 1819] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/de/Ampere_1844_FA4H1833.xml Ampère 1844] * Medium * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Vitruvius_1511_XS9KA6WS.xml Vitruvius 1511] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Cataneo_1600.xml Cataneo 1600] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Aristoteles_1547_731P9Q9Y.xml Aristoteles 1547] (italics) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Archimedes_1565_4E7V2WGH.xml Archimedes 1565] (many pictures (problematic?)) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/it/Cataneo_1572_ZBAS6ZM1.xml Cataneo 1572] (bad printing) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Viviani_1659_QN4GHYBF.xml Viviani 1659] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/it/Bianconi_1746_0347ZVRW.xml Bianconi 1746] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/it/Zanotti_1752_16HBZHF5.xml Zanotti 1752] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/de/Bion_1765_TGXUZC1H.xml Bion 1765] (blackletter) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Vitruvius_1800_V82APKX9.xml Vitruvius 1800] (clear, but weak printing, can see printing of the other side) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/it/Gallaccini_1767_D09WWP72.xml Gallaccini 1767] (small font, but quite clear) * [http://mpdl-dev.mpiwg-berlin.mpg.de/ECHOdocuViewfull?url=/mpiwg/online/permanent/library/54K7BX2W Angeli 1668] (contains italics, otherwise very clear) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Trigault_1639_QA0BRSXP.xml Trigault 1639] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Bernoulli_1738_AZ870BWE.xml Bernoulli 1738] * Hard * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Vitruvius_1544_2UZM8E2N.xml Vitruvius 1544] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/de/Vitruvius_1757_DNBVYAV4.xml Vitruvius 1757] (Mixed blackletter and antiqua) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/it/Zonca_1656_UR271U6Y.xml Zonca 1656] (bad printing) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/en/Bacon_1670_WX8HY2V2.xml Bacon 1670] (contains italics, bad printing) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Clavius_1606_FBVYV7EH.xml Clavius 1606] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Barrow_1674_U19ERSE3.xml Barrow 1674] (bad printing, italics and Greek) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Gravesande_1721_KN9XTZRQ.xml Gravesande 1721] * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/fr/Vitruvius_1618_3XFC5KGV.xml Vitruvius 1618] (Thesaurus has columns) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/fr/Mersenne_1635_508_fr.xml Mersenne 1635] (Microfilm) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Aristoteles_1548_9NN63YC9.xml Aristoteles 1548] (contains Greek) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/it/Vitruvius_1556_XYTWCGV1.xml Vitruvius 1556] (extremly small font) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Aristoteles_1585_XSY685ZD.xml Aristoteles 1585] (bad printing) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/de/Specklin_1599_SSM0YQED.xml Specklin 1599] (bad printing, blackletter) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Biancani_1635_GWS4WXH4.xml Biancani 1635] (very small font) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Vitruvius_1567_514_la.xml Vitruvius 1567] (very small font, mixed italics and upright) * [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/interface/echo/echoDocuView.xql?document=/echo/la/Archimedes_1565_YS05QMU8.xml Archimedes 1565] (mixed italics and upright)