wiki:OCR_evaluation

The workflow is going to be adapted to allow the use of OCRed text as input. The OCR engine is going to be OCRopus.

Tutorial video and other videos

The documents of the previous workflows were assessed in terms of how well they might perform being OCRed.

Command overview

The following commands (taken from above video) allow the recognition of English text:

  1. ocropus-binarize 035.jpg
  2. ocropus-pseg book/????.png
  3. ocropus-lattices -m OCRopus/ocropy/2m2-reject.cmodel book/0001/??????.png
  4. ocropus-align -l OCRopus/ocropy/data/default.fst book/0001/??????.fst
  5. ocropus-hocr book/

First results

This is a page of Bacon 1670 (link).

OCRopus 0.4

With above commands, the output does not look too good (xhtml file attached):

116
tha| yet it cannot be for other - Kcafo+: - ~r f1r(l, it is found 9tit upon
ccrta;n Trccs; and thofe Trces bear no fuch Fruit, os may allure that |rd to
B|rrics, and fo is of(cn found thcrc| which may have givcn occaf1on tothc
tale. But that which maketh an cnd of th|quc(lion is, that M1(Tcltoe hath
w|nld go into a Bv@gh( ?nd bcf1dcs/it feemcrh to bc morcfat and un@uous,
th1n the ordinary Sap of thc T!5c| both py thc Bcrry which is clammy, and
by that it connnucth grccn winter and Summcr, which Ihc Trcc do(h
not.
tryal would bc m-c,b|riPPing vf thc Bough of aCrab~cc inthe Ba~and
likely, to try it with fomeothcr watering or anointing, . that w7c notfonatu-
things askill not me Bongh.
1t wcrc good to try, what T\&nM would put forth, @ thcy bc fo|biddcn
Io p@tfottb thcir natn|l Boughs: powl thcreforc a Tree; .andcover it,(omc
will Dut forth Roots 9 for fo will a Cions, bcir@ turncd down into clav.
fhat is .not fo f1atnral to thcplant as Clay isg tryit with Lcathcr,or Clotb, or
becn knodn t|gtow o\&t of apol9rd-
.A Man may count the prickcs / Trccs to bcakinde d Excrcfccnce,for
the/will ncvcr be@onghs| nor be Lcaves. @c plants that have p.ricklcs,
are Thorns, Black and whitc. s Bryer, Rofc, Ltmmon"trecsv Crab-trccq
Pricklcs in thc Lc2f .are, H9lly5 Juniper, vhin .bu(h, ~hi(bc i Ncttlcs alfo
have a fiall vencmous Fr~Ici fo hath Borrage, but harmlefs. Thc . @ufc
For the ha@c of the5pirit to put "Th, a@d the w|9t ot nourimmcty toput
forth a Bough, and thc clof~fs of thc Balks caufc Frickles .in Bonghsg 0nd
the Leav@s othcrw.!fcarc Wygh,as Burragc and Nettles are. As (or thc Leavcs
of Holl)|, they arefmooth, but nevcr Plain, bnr asit wcrc with folds for thc2
amc caufc.
56o. Therc be alfo Tlots, thavhough the? havc no ?~kles, yet they havea
kinde of Downey 9r Velvct Rine upop their Leafes; as Refe-C\<mpi0@ |te~
Cdrh2isns,,

OCRopus 0.2

Better results here (command was ocroscript rec-tess --tesslanguage=eng bacon_0150.jpg, so no recognition for long s.

ye: in cannot bc for other Rcafons : For Hrfi, it is found bur upon
ccrrain Trccs; and chofc Trccs bear no fuch Fruit. as may allure char Blrd to
{lr andfécd uponnhcm. It may bc, that Bird fccdcrh uponrbc Miffclrdv
Berries, and fo is cfrcn found there ; which may have given occauon tothe
male. Bur chat which maketh an end of rhequellion is, that Mrlfelroe lmh
been found ru pur forth under the B0ughs,and not (oncly) aboverhe Bnnghg; lé
fo it cannotbe anything that fallcth upon the Bough. Mifltoe growerh
chiclly upon Crab-trees, Applcstrecs, fometimes upon H»{les, and rarely
upon Oaks; the Miilcltoc whereof is counted very Medicinal. lt is ever
reenWrntcr and Summer, and beareth a white gllllering Berry s and ir is a
B, Planr, utterly diEcringfrom the Planr, upon which ir groweth. Two things lt bb
tlzerefore may be certainly ietdown; Firll, that Superfxtation mue y
abundance of$ap, in the Bough that putteth it forth; Secondly, that that
Sap muh be {rich as the Tree doth excern, and cannot aflimilate, for elfe it
would go into aBough; and beGdes,it feemeth to be more fat and unétuous,
than the ordinary Sap Of the Tree; both l>y`thcBerry which is clammy, and
by that it continueth green Winter and bummer, which tne lree cloth
not.
This Expqvimcntof Miykltoc may give light to other praétices; therefore

tryal would be made,by ripping of the Bough of aCrab-tree in the Batl<,_and dd fif i
watering ofthe Wound every day, with warm water unge, to ee t
would bring forth Mifleltoe, or any fuehlikething. But it were yet more
likely, to tty it with fomeother watering or anointing, that were notfontitn. Shbfh
tal to the Treeas Wateris; as Oyl, or BarmofDrink, dec. o tey e uc
things askill not the Bough.
It weregood to try, what ‘TI¢m: would put forth, if they be forbidden


to putforth their natural Boughs: Powl therefore a Tree, and c_overit,fome
thicknefs withClay on the top, and fee what it will put forth. I luppole it
will put forth Roots; for fo will a Cions, beirzg turned down into Clay. fd h fh
Therefore in this Experiment alfo, the Tree would be cloewitomewat
that isnor fo natural to the Plantas Clay is; tryit with Leather, or Cloth, or
Painting, foitbe not hurtful tothe Tree. And it is certain, that aBrake hath
been known to- grow out of a Pollard.
AMan may count the Prickes of Trees to be a kinde of Excrefcence,for

they will never be Boughs, not bear Leaves. The Plants that have Prickles, fLCb
are Thorns, Black and White ; Bryer, Roe, emmon-trees, ra-trees;
Goosbetry, Betberry ; thefe have it inthe Bough. The Plants that have '
Ptiekles inthe Leaf are, Holly,]uniper, \Vhin·bulh, lhillle ; Netrles alfo
haveafmallvenemous1’ricl<le; fohathBorrage, but harmlefs. The caufe
muft be, hafiy putting forth, want of moifiure,and the clofcnefs of the Bark:
For the hafle ofthe Spirittopur forth. and the want of nourifhment to put
forth a Bough, andthe clofenefs of the Bark, caufe Prickles in Boughs; and
therefore they are ever like a Tyrnmis, for that the moifiure fpendeth after a lit. ff ifh
tle putting forth. And for Prickles in Leaves, they come alo oputtng ortfhdhf
more ]uyceintothe Leaf that can fpred in the Leaf mooa; antereore
, { the Leaves otherwife are rough,as Burrage and Nertles are. As for the Leaves ih fldfh
of Holly, they arefmooth, but never plain , but asit were wtos or te
fame caufe
, . T l There be alfo {Plants, that though they have no·Pricltles, yet they have a
kinde of Downey or Velvet Rine upontheir Leafes; as Rafe-Campion Sterli- fbSi
Gilliflomrs, cali:-fm; which Down orNap cometh ofa utile pirit, n a
foft or fat lubfiance. For it iscertain. that both S:vtk»Gilli]lo1v¢r$, and Rap--
Campimr,

Another example

This is Wilkins 1684:

OCRopus 0.4

Raw OCR:

48 mttIf uaa |J"|5rM1 
D~@ii&9yb us th|q|~yf,PmmfM?Y5-T?!
HoweVcr, the Ppnc eny wt 9b ft| 
L1ve at an eaHer 1iHe, Dy !ccu9% U9 Mc |- 
Lofs iii being depr|d .dt thR p|y|cK, . u?Ps 
only, |nd receiVi9g if ~e|.y9urifh~Mj 
at fome times we had the pr1~ggT to ne" 
and fof . this Very Reai~ uaysM wapm9es
it| Thenindeed Pbil# the Rw thin|s it w9Wd 
Enabled to tarrd9rty Day|an.ooqy Nigptg
in the Mount without eating any pMng, we- 
/aefeb@e, Plat#, ~~ and qpFrs. ? -put De.. 
&ufe he there heard the .Mdody bt tne -Fh 
cadfe it is not no~ Ithink, .Adirm.d ~ anyi 
veus.-Rifam teneat|e. -I Rn9w tlMs M|@ck 
I fhall not therefore beflow either Pains or 
Time in arguing againR it. 
Lath hadgreat P4trons, pom @acFd.ang PP- 
It ma| (ildic| t@atoaVeonb Named mefe 
ma1F - Autb9rsJLo as o|Y~ @rde, PRF,
Three l0(% a94 foE thT two 1ee eeiF~ 
as may make it po|f1ble tobe Inhabited, and 
have refbrred the Keader to othcrs tor iaus- 
what thofeCualities are whereinit moren|ar- 
fadioti I ihaR in the next p~ procee4 td 
ly A|rees ah our Ear& 
pROP n& 
the Natd|e of theMo@ns Body,to knoP bhe- 
TA2t tIk |0|n | d wa CwnNde|l, Op~d 
Bod|
ther that be capibm ofany fildh conditions2 
Ir|%t:2|&ff2Xf22@ 
agrccd 

OCRopus 0.2

33 Thai the M0012 my he a World.
However, thc World would have no gYe2t
Lol? io being depriv’d of this Mulick, unlefs
gt: fome times we had che priviledge co hear
it; z Then indeed Philo the Jew thinks it would

DE [,,,,,,H,_ [ave us the (ghages of_ Dy;et,__ and wemight
Live at an eaher Kate, by feeding on the har
only, and receiving no other Nourifhment;
and for this very Reafon (Qysahe) was Moks
Enabled to tarry Forty Days andxliorty Nights
in the Mount without eating anything, be·
caufe he there heard the Melody of the Heaé
vens.—-Rifom tmront. I know this Muick
hath had great Patrons, both Sacred and Pro—
phane Authors,fuch as zlmorojé, Beds, Boerzoy,
Aoeklme, Ploto, Cicero, and others; but be··
caufe it is not now, 1 think, Afhrnfd by any,
I {hall not therefore bellow either Pains or
Time in arguing againlt it.
It may fuilice that I have only Named thefe

Three lali, and for the two more neceiiary,
have referred the Reader to others for fatis-
faétion. I {hall in the next place Proceed to
the Nature of the Moons Body, to know whe-
ther that be Capable of any fuch Conditions,
as may make it polllble to be Inhabited, and
what thofe Qualities are wherein it more near»»
ly Agrees with our Earth.

P R O P. IV.,

Tho; the Mano is o Solid, Compoéird, Oporoos
B0dy¤
I Shall not need to {land long in the Proof of is this Propoixtion, {ince it is a Truth already
agreed
Last modified 13 years ago Last modified on Jul 21, 2011, 4:32:50 PM

Attachments (1)

Download all attachments as: .zip