Context Navigation

Instructions for cutting out images

The figures should be cut out of the TIFF-images, rather than the compressed JPG-images in online_permanent/library on foxridge. The Digigroup should know where the relevant images are.
Do not cut out drop caps or embellishments, except for decorative images on the title page
If there is already an xml-version of the text, it is handy to extract a list of all figures that are to be cut out. You can do this for example by using XQuery in the display system:
```
//echo:figure
```
resp.
```
//echo:image
```
Depending on how much context you like (with caption or not). A list of all figures in ECHO has been extracted (download) and converted to html which contains links to each page containing figures. Note that the link might take you to the page after the picture (happens if link is in a float-div).
Now, in the viewing environment, mark the images using digilib's "zoom area" tool. Be careful not to cut out surrounding text, including the catchword at the bottom of the page. However, captions are to be cut out, as well. It is sometimes advisable to first cut out a bigger section around the figure. Some small images are only there for decorative purposes. The policy is not to cut out these ones. Lateron, these have to be deleted from the xml file.
Save the URLs of the pages with the zoomed area into a text file (keyboard shortcuts come in handy here: cmd-l-c-w copies the link in the address bar and closes the tab, cmd-TAB switches to a text editor, cmd-v inserts the link. To note in the file which images have to be removed from the XML, copy the URL to the text file, but insert a # before that. You can also write other comments into this file, but be sure to begin the line with a #. The resulting list is to be saved in a new directory on the same level as the raw and the xml directory (see trunk/texts/WO_1/Stevin_1605 as an example). When trained, the average speed for cutting out figures is 2.5 figures per minute (completed Stevin_1605 in 2 hours)
On the basis of this text file, the Python script cut_figures.py takes care of cutting out the images from the original TIFF files and saves them in the desired format page-imagenumber (e. g., if pageimage 0056.tif has three figures, these figures will be saved as 0056-01.tif, 0056-02.tif and 0056-03.tif), by calling Imagemagick's commands identify and convert. They are stored in a folder called figures
1. Following error might occur
```
convert: AnErrorHasOccurredReadingFromFile `/Volumes/online_permanent/archimedes_repository/large/stevi_stati_527_la_1605/527-01-pageimg/527.01.139.jpg': Bad file descriptor @ constitute.c/ReadImage/575.
convert: missing an image filename `/Volumes/online_permanent/archimedes_repository/large/stevi_stati_527_la_1605/figures/527.01.139-02.jpg' @ convert.c/ConvertImageCommand/2775.
```
2. In that case, rename the directory on Foxridge, run the script once more and merge the directories. Do this until the output of the script about how many figures it extracted and the number of files in the figures directory are the same.
It should be made sure that the excluded images are removed both from the raw text and the XML file.

Discussion

Should an Ex libris be cut out?
- Answer: No
Should this be treated as one image: Apian 1550? Otherwise, spatial information might get lost (see text version of that page)
Data entry tagged every single figure on this page. Should this be preserved? Probably yes, as some figures have variables attached (see text version)
Ornamentary stuff like this? Maybe the ornament on the title page should always be cut out for eye candy reasons (because, at one point in the future, maybe, the viewer will display this page as the default first page). However, is missing in Benedetti 1585.
Hypothesis to be checked: before the DE firm was supposed to type variables, multifigure pages were not divided into parts

Number of figures per document

(NB: the numbers are based on the figure-tags in the xml document which again are based on the fig-tags typed in by the data entry firm. Deciding on what counts as a figure is an intellectual process and cannot be decided by the data entry. Thus, the number of figures per document can differ slightly.)

Book Figures
/echo/de/Adams_1785_S7ECRGW8.xml 108
/echo/de/Bernstein_1897_01-05_GGAGCX1B.xml 102
/echo/de/Bernstein_1897_06-11_PWVX6XFT.xml 117
/echo/de/Bernstein_1897_12-16_X323E11C.xml 68
/echo/de/Bernstein_1897_17-21_HQ8URX9E.xml 126
/echo/de/Bion_1765_TGXUZC1H.xml 434
/echo/de/Boskovic_1765_YPS3EYQ2.xml 29
/echo/de/Lehmann-Brockhaus_1983.xml 415
/echo/de/Specklin_1599_SSM0YQED.xml 130
/echo/en/Apollonius_1771_FDWQ9FD5.xml 27
/echo/en/Bacon_1670_WX8HY2V2.xml 19
/echo/en/Gravesande_1724_N1TU6UZF.xml 82
/echo/en/Wilkins_1684_TG3ZW27M.xml 17
/echo/fr/Belidor_1754_M1R3K3S6.xml 68
/echo/fr/Belidor_1757_R04RNX9Y.xml 68
/echo/fr/Berzelius_1819_WCWY69V2.xml 2
/echo/fr/Mersenne_1635_508_fr.xml 61
/echo/fr/Papin_1682_A8SP3HCB.xml 25
/echo/fr/Ufano_1628_QXRZU2BV.xml 68
/echo/fr/Varignon_1687_TP04WPNS.xml 62
/echo/fr/Vitruvius_1618_3XFC5KGV.xml 125
/echo/fr/Voltaire_1738_1FP6HWGK.xml 123
/echo/it/Alberti_1565_5PPYB69C.xml 105
/echo/it/Angeli_1668a.xml 27
/echo/it/Angeli_1668b.xml 13
/echo/it/Angeli_1671.xml 37
/echo/it/Benedetti_1579_507_it.xml 1
/echo/it/Bianconi_1746.xml 5
/echo/it/Casati_1685_1YZKBTHR.xml 75
/echo/it/Cataneo_1567_DSDY9XH0.xml 69
/echo/it/Cataneo_1572_ZBAS6ZM1.xml 129
/echo/it/Cavalieri_1632_CE3XGS5P.xml 30
/echo/it/Gallaccini_1767_D09WWP72.xml 224
/echo/it/Heron_1601_M5C8103Y.xml 34
/echo/it/Vitruvius_1524_ZFRVKXMF.xml 135
/echo/it/Vitruvius_1556_XYTWCGV1.xml 168
/echo/it/Vitruvius_1747_Y1G1TRCW.xml 14
/echo/it/Zanotti_1752_16HBZHF5.xml 2
/echo/it/Zonca_1656_UR271U6Y.xml 55
/echo/la/Angeli_1659.xml 92
/echo/la/Apian_1541_9TE6563P.xml 12
/echo/la/Apian_1550_PUBSU9QD.xml 69
/echo/la/Apollonius_1661_1X8T70WB.xml 526
/echo/la/Archimedes_1565.xml 151
/echo/la/Archimedes_1565_YS05QMU8.xml 38
/echo/la/Aristoteles_1547.xml 13
/echo/la/Aristoteles_1548_9NN63YC9.xml 1
/echo/la/Barrow_1674.xml 40
/echo/la/Benedetti_1585.xml 444
/echo/la/Bernoulli_1738_AZ870BWE.xml 28
/echo/la/Biancani_1635_GWS4WXH4.xml 147
/echo/la/Casati_1686_UEY6QQZ7.xml 18
/echo/la/Cataneo_1600.xml 117
/echo/la/Cavalieri_1653.xml 370
/echo/la/Clavius_1581_MXTKM8TF.xml 432
/echo/la/Clavius_1586.xml 372
/echo/la/Clavius_1591_DP9UZA52.xml 104
/echo/la/Clavius_1606_FBVYV7EH.xml 294
/echo/la/Ghetaldi_1603_FQPFR8XP.xml 23
/echo/la/Gravesande_1721_KN9XTZRQ.xml 98
/echo/la/Heron_1680_C3M2XK8N.xml 98
/echo/la/Huygens_1724_1_BYCAB3V6.xml 168
/echo/la/Huygens_1724_2_Y97ESDAP.xml 241
/echo/la/Musschenbroek_1729_H9ZYCGQ0.xml 32
/echo/la/Stevin_1605_527_la.xml 251
/echo/la/Vitruvius_1490_4YSU4X91.xml 1
/echo/la/Vitruvius_1511_XS9KA6WS.xml 136
/echo/la/Vitruvius_1543_T05R2RPS.xml 91
/echo/la/Vitruvius_1544_2UZM8E2N.xml 64
/echo/la/Vitruvius_1552_UNCWSTHE.xml 107
/echo/la/Vitruvius_1567_514_la.xml 174
/echo/la/Vitruvius_1800_V82APKX9.xml 1
/echo/la/Viviani_1659.xml 272
/echo/la/Weidler_1726_NSW2F6PF.xml 17
/echo/la/Zubler_1607_DNYGYWGH.xml 20
/echo/la/alvarus_1509.xml 14
/echo/zh/SongYingxing_1637.xml 160
Total: 8635

Last modified 14 years ago Last modified on Dec 16, 2010, 1:15:38 PM

Attachments (5)

echo-figures.xml (3.1 MB) - added by Klaus Thoden 15 years ago. An xml document containing the xquery of "echo:figure" of all 78 ECHO documents
echo-figures.html (4.1 MB) - added by Klaus Thoden 15 years ago. HTMLized Xquery results for all figures in ECHO documents
arch_cut_images.py (5.5 KB) - added by Klaus Thoden 15 years ago. Same tool for Archimedes files, one day, it will be one tool for all
cut_images.py (5.5 KB) - added by Klaus Thoden 15 years ago. A bit more comfortable
Alvarus_1509_YHKVZ7B4.fig (2.4 KB) - added by Klaus Thoden 15 years ago. Figure coordinates for Alvarus

Download in other formats:

Plain Text

Book	Figures
/echo/de/Adams_1785_S7ECRGW8.xml	108
/echo/de/Bernstein_1897_01-05_GGAGCX1B.xml	102
/echo/de/Bernstein_1897_06-11_PWVX6XFT.xml	117
/echo/de/Bernstein_1897_12-16_X323E11C.xml	68
/echo/de/Bernstein_1897_17-21_HQ8URX9E.xml	126
/echo/de/Bion_1765_TGXUZC1H.xml	434
/echo/de/Boskovic_1765_YPS3EYQ2.xml	29
/echo/de/Lehmann-Brockhaus_1983.xml	415
/echo/de/Specklin_1599_SSM0YQED.xml	130
/echo/en/Apollonius_1771_FDWQ9FD5.xml	27
/echo/en/Bacon_1670_WX8HY2V2.xml	19
/echo/en/Gravesande_1724_N1TU6UZF.xml	82
/echo/en/Wilkins_1684_TG3ZW27M.xml	17
/echo/fr/Belidor_1754_M1R3K3S6.xml	68
/echo/fr/Belidor_1757_R04RNX9Y.xml	68
/echo/fr/Berzelius_1819_WCWY69V2.xml	2
/echo/fr/Mersenne_1635_508_fr.xml	61
/echo/fr/Papin_1682_A8SP3HCB.xml	25
/echo/fr/Ufano_1628_QXRZU2BV.xml	68
/echo/fr/Varignon_1687_TP04WPNS.xml	62
/echo/fr/Vitruvius_1618_3XFC5KGV.xml	125
/echo/fr/Voltaire_1738_1FP6HWGK.xml	123
/echo/it/Alberti_1565_5PPYB69C.xml	105
/echo/it/Angeli_1668a.xml	27
/echo/it/Angeli_1668b.xml	13
/echo/it/Angeli_1671.xml	37
/echo/it/Benedetti_1579_507_it.xml	1
/echo/it/Bianconi_1746.xml	5
/echo/it/Casati_1685_1YZKBTHR.xml	75
/echo/it/Cataneo_1567_DSDY9XH0.xml	69
/echo/it/Cataneo_1572_ZBAS6ZM1.xml	129
/echo/it/Cavalieri_1632_CE3XGS5P.xml	30
/echo/it/Gallaccini_1767_D09WWP72.xml	224
/echo/it/Heron_1601_M5C8103Y.xml	34
/echo/it/Vitruvius_1524_ZFRVKXMF.xml	135
/echo/it/Vitruvius_1556_XYTWCGV1.xml	168
/echo/it/Vitruvius_1747_Y1G1TRCW.xml	14
/echo/it/Zanotti_1752_16HBZHF5.xml	2
/echo/it/Zonca_1656_UR271U6Y.xml	55
/echo/la/Angeli_1659.xml	92
/echo/la/Apian_1541_9TE6563P.xml	12
/echo/la/Apian_1550_PUBSU9QD.xml	69
/echo/la/Apollonius_1661_1X8T70WB.xml	526
/echo/la/Archimedes_1565.xml	151
/echo/la/Archimedes_1565_YS05QMU8.xml	38
/echo/la/Aristoteles_1547.xml	13
/echo/la/Aristoteles_1548_9NN63YC9.xml	1
/echo/la/Barrow_1674.xml	40
/echo/la/Benedetti_1585.xml	444
/echo/la/Bernoulli_1738_AZ870BWE.xml	28
/echo/la/Biancani_1635_GWS4WXH4.xml	147
/echo/la/Casati_1686_UEY6QQZ7.xml	18
/echo/la/Cataneo_1600.xml	117
/echo/la/Cavalieri_1653.xml	370
/echo/la/Clavius_1581_MXTKM8TF.xml	432
/echo/la/Clavius_1586.xml	372
/echo/la/Clavius_1591_DP9UZA52.xml	104
/echo/la/Clavius_1606_FBVYV7EH.xml	294
/echo/la/Ghetaldi_1603_FQPFR8XP.xml	23
/echo/la/Gravesande_1721_KN9XTZRQ.xml	98
/echo/la/Heron_1680_C3M2XK8N.xml	98
/echo/la/Huygens_1724_1_BYCAB3V6.xml	168
/echo/la/Huygens_1724_2_Y97ESDAP.xml	241
/echo/la/Musschenbroek_1729_H9ZYCGQ0.xml	32
/echo/la/Stevin_1605_527_la.xml	251
/echo/la/Vitruvius_1490_4YSU4X91.xml	1
/echo/la/Vitruvius_1511_XS9KA6WS.xml	136
/echo/la/Vitruvius_1543_T05R2RPS.xml	91
/echo/la/Vitruvius_1544_2UZM8E2N.xml	64
/echo/la/Vitruvius_1552_UNCWSTHE.xml	107
/echo/la/Vitruvius_1567_514_la.xml	174
/echo/la/Vitruvius_1800_V82APKX9.xml	1
/echo/la/Viviani_1659.xml	272
/echo/la/Weidler_1726_NSW2F6PF.xml	17
/echo/la/Zubler_1607_DNYGYWGH.xml	20
/echo/la/alvarus_1509.xml	14
/echo/zh/SongYingxing_1637.xml	160
Total:	8635