wiki:Cutting out images

Version 8 (modified by Klaus Thoden, 13 years ago) (diff)

--

Instructions for cutting out images

  1. The figures should be cut out of the TIFF-images, rather than the compressed JPG-images in online_permanent/library on foxridge. The Digigroup should know where the relevant images are.
  2. Do not cut out drop caps or embellishments, e. g. decorative images on the title page
  3. If there is already an xml-version of the text, it is handy to extract a list of all figures that are to be cut out. You can do this for example by using XQuery in the display system:
    //echo:figure
    
    resp.
    //echo:image
    
    Depending on how much context you like (with caption or not).
  4. Now, in the viewing environment, you can browse through the book, visiting the respective pages and mark the images using digilib's "zoom area" tool. Be careful not to cut out surrounding text, including the catchword at the bottom of the page. However, captions are to be cut out, as well. Some small images are only there for decorative purposes. The policy is not to cut out these ones. Lateron, these have to be deleted from the xml file.
  5. Save the URLs of the pages with the zoomed area into a text file. Also, note in the file which images have to be removed from the XML. The list should be stored somewhere for future reference (e. g. in case the figures are not produced by cutting the files, but by extracting the figures on the fly). The first list was generated for Apian 1550.
  6. On the basis of this text file, the Python script cut_images.py (attached to this page) takes care of cutting out the images from the original TIFF files and saves them in the desired format page-imagenumber (e. g., if pageimage 0056.tif has three figures, these figures will be saved as 0056-01.tif, 0056-02.tif and 0056-03.tif), by calling Imagemagick's commands identify and convert. They are stored in a folder called figures
  7. This folder is then uploaded to the foxridge server alongside the pageimg-folder of the respective book: folder online_permanent/library/XXXXXXXX will then contain both a pageimg and a figures folder.
  8. It should be made sure that the excluded images are removed from the XML file.

Discussion

  1. Should an Ex libris be cut out?
  2. Should this be treated as one image: Apian 1550? Otherwise, spatial information might get lost (see text version of that page)
  3. Data entry tagged every single figure on this page. Should this be preserved? Probably yes, as some figures have variables attached (see text version)

Number of figures per document

(NB: the numbers are based on the figure-tags in the xml document which again are based on the fig-tags typed in by the data entry firm. Deciding on what counts as a figure is an intellectual process and cannot be decided by the data entry. Thus, the number of figures per document can differ slightly.)

Book Figures
/echo/de/Adams_1785_S7ECRGW8.xml 108
/echo/de/Bernstein_1897_01-05_GGAGCX1B.xml 102
/echo/de/Bernstein_1897_06-11_PWVX6XFT.xml 117
/echo/de/Bernstein_1897_12-16_X323E11C.xml 68
/echo/de/Bernstein_1897_17-21_HQ8URX9E.xml 126
/echo/de/Bion_1765_TGXUZC1H.xml 434
/echo/de/Boskovic_1765_YPS3EYQ2.xml 29
/echo/de/Lehmann-Brockhaus_1983.xml 415
/echo/de/Specklin_1599_SSM0YQED.xml 130
/echo/en/Apollonius_1771_FDWQ9FD5.xml 27
/echo/en/Bacon_1670_WX8HY2V2.xml 19
/echo/en/Gravesande_1724_N1TU6UZF.xml 82
/echo/en/Wilkins_1684_TG3ZW27M.xml 17
/echo/fr/Belidor_1754_M1R3K3S6.xml 68
/echo/fr/Belidor_1757_R04RNX9Y.xml 68
/echo/fr/Berzelius_1819_WCWY69V2.xml 2
/echo/fr/Mersenne_1635_508_fr.xml 61
/echo/fr/Papin_1682_A8SP3HCB.xml 25
/echo/fr/Ufano_1628_QXRZU2BV.xml 68
/echo/fr/Varignon_1687_TP04WPNS.xml 62
/echo/fr/Vitruvius_1618_3XFC5KGV.xml 125
/echo/fr/Voltaire_1738_1FP6HWGK.xml 123
/echo/it/Alberti_1565_5PPYB69C.xml 105
/echo/it/Angeli_1668a.xml 27
/echo/it/Angeli_1668b.xml 13
/echo/it/Angeli_1671.xml 37
/echo/it/Benedetti_1579_507_it.xml 1
/echo/it/Bianconi_1746.xml 5
/echo/it/Casati_1685_1YZKBTHR.xml 75
/echo/it/Cataneo_1567_DSDY9XH0.xml 69
/echo/it/Cataneo_1572_ZBAS6ZM1.xml 129
/echo/it/Cavalieri_1632_CE3XGS5P.xml 30
/echo/it/Gallaccini_1767_D09WWP72.xml 224
/echo/it/Heron_1601_M5C8103Y.xml 34
/echo/it/Vitruvius_1524_ZFRVKXMF.xml 135
/echo/it/Vitruvius_1556_XYTWCGV1.xml 168
/echo/it/Vitruvius_1747_Y1G1TRCW.xml 14
/echo/it/Zanotti_1752_16HBZHF5.xml 2
/echo/it/Zonca_1656_UR271U6Y.xml 55
/echo/la/Angeli_1659.xml 92
/echo/la/Apian_1541_9TE6563P.xml 12
/echo/la/Apian_1550_PUBSU9QD.xml 69
/echo/la/Apollonius_1661_1X8T70WB.xml 526
/echo/la/Archimedes_1565.xml 151
/echo/la/Archimedes_1565_YS05QMU8.xml 38
/echo/la/Aristoteles_1547.xml 13
/echo/la/Aristoteles_1548_9NN63YC9.xml 1
/echo/la/Barrow_1674.xml 40
/echo/la/Benedetti_1585.xml 444
/echo/la/Bernoulli_1738_AZ870BWE.xml 28
/echo/la/Biancani_1635_GWS4WXH4.xml 147
/echo/la/Casati_1686_UEY6QQZ7.xml 18
/echo/la/Cataneo_1600.xml 117
/echo/la/Cavalieri_1653.xml 370
/echo/la/Clavius_1581_MXTKM8TF.xml 432
/echo/la/Clavius_1586.xml 372
/echo/la/Clavius_1591_DP9UZA52.xml 104
/echo/la/Clavius_1606_FBVYV7EH.xml 294
/echo/la/Ghetaldi_1603_FQPFR8XP.xml 23
/echo/la/Gravesande_1721_KN9XTZRQ.xml 98
/echo/la/Heron_1680_C3M2XK8N.xml 98
/echo/la/Huygens_1724_1_BYCAB3V6.xml 168
/echo/la/Huygens_1724_2_Y97ESDAP.xml 241
/echo/la/Musschenbroek_1729_H9ZYCGQ0.xml 32
/echo/la/Stevin_1605_527_la.xml 251
/echo/la/Vitruvius_1490_4YSU4X91.xml 1
/echo/la/Vitruvius_1511_XS9KA6WS.xml 136
/echo/la/Vitruvius_1543_T05R2RPS.xml 91
/echo/la/Vitruvius_1544_2UZM8E2N.xml 64
/echo/la/Vitruvius_1552_UNCWSTHE.xml 107
/echo/la/Vitruvius_1567_514_la.xml 174
/echo/la/Vitruvius_1800_V82APKX9.xml 1
/echo/la/Viviani_1659.xml 272
/echo/la/Weidler_1726_NSW2F6PF.xml 17
/echo/la/Zubler_1607_DNYGYWGH.xml 20
/echo/la/alvarus_1509.xml 14
/echo/zh/SongYingxing_1637.xml 160
Total: 8635

Attachments (5)