wiki:Cutting out images

Context Navigation

Version 5 (modified by Klaus Thoden, 13 years ago) (diff)
Script more or less finished.

Instructions for cutting out images

The figures should be cut out of the TIFF-images, rather than the compressed JPG-images in online_permanent/library on foxridge. The Digigroup should know where the relevant images are.
Do not cut out drop caps or embellishments, e. g. decorative images on the title page
If there is already an xml-version of the text, it is handy to extract a list of all figures that are to be cut out. You can do this for example by using XQuery in the display system:
```
//echo:figure
```
resp.
```
//echo:image
```
Depending on how much context you like (with caption or not).
Now, in the viewing environment, you can browse through the book, visiting the respective pages and mark the images using digilib's "zoom area" tool. Be careful not to cut out surrounding text, including the catchword at the bottom of the page. However, captions are to be cut out, as well. Some small images are only there for decorative purposes. The policy is not to cut out these ones. Lateron, these have to be deleted out of the xml file.
Save the URLs of the pages with the zoomed area into a text file. Also, note in the file which images have to be removed from the XML. The list should be stored somewhere for future reference (e. g. in case the figures are not produced by cutting the files, but by extracting the figures on the fly). The first list was generated for Apian 1550.
On the basis of this text file, the Python script cut_images.py (attached to this page) takes care of cutting out the images from the original TIFF files and saves them in the desired format page-imagenumber (e. g., if pageimage 0056.tif has three figures, these figures will be saved as 0056-01.tif, 0056-02.tif and 0056-03.tif), by calling Imagemagick's commands identify and convert. They are stored in a folder called figures
This folder is then uploaded to the foxridge server alongside the pageimg-folder of the respective book: folder online_permanent/library/XXXXXXXX will then contain both a pageimg and a figures folder.
It should be made sure that the excluded images are removed from the XML file.

Attachments (5)

echo-figures.xml (3.1 MB) - added by Klaus Thoden 13 years ago. An xml document containing the xquery of "echo:figure" of all 78 ECHO documents
echo-figures.html (4.1 MB) - added by Klaus Thoden 13 years ago. HTMLized Xquery results for all figures in ECHO documents
arch_cut_images.py (5.5 KB) - added by Klaus Thoden 13 years ago. Same tool for Archimedes files, one day, it will be one tool for all
cut_images.py (5.5 KB) - added by Klaus Thoden 13 years ago. A bit more comfortable
Alvarus_1509_YHKVZ7B4.fig (2.4 KB) - added by Klaus Thoden 13 years ago. Figure coordinates for Alvarus

Download in other formats:

Plain Text