Changes between Version 4 and Version 5 of Cutting out images


Ignore:
Timestamp:
Nov 24, 2010, 1:57:18 PM (13 years ago)
Author:
Klaus Thoden
Comment:

Script more or less finished.

Legend:

Unmodified
Added
Removed
Modified
  • Cutting out images

    v4 v5  
    1414 Depending on how much context you like (with caption or not).
    1515 1. Now, in the viewing environment, you can browse through the book, visiting the respective pages and mark the images using digilib's "zoom area" tool. Be careful not to cut out surrounding text, including the catchword at the bottom of the page. However, captions are to be cut out, as well. Some small images are only there for decorative purposes. The policy is not to cut out these ones. Lateron, these have to be deleted out of the xml file.
    16  1. Save the URLs of the pages with the zoomed area into a text file. Also, note in the file which images have to be removed from the XML.
    17 
    18 Some steps missing (include creating a script to extract the figures from the raw TIFF files based on the list with the URLs). Also the list should be stored somewhere for future reference (e. g. in case the figures are not produced by cutting the files, but by extracting the figures on the fly). The first list was generated for [http://it-dev.mpiwg-berlin.mpg.de:81/tracs/mpdl-project-content/attachment/wiki/WO3_Apian_1550/Apian_1550_PUBSU9QD_figure-coords.txt Apian 1550].
    19 
    20  1. Save the cropped images again as TIFF in the format {{{page-imagenumber}}}, e. g., if pageimage {{{0056.tif}}} has three figures, these figures will be saved as {{{0056-01.tif}}}, {{{0056-02.tif}}} and {{{0056-03.tif}}}, respectively.
    21  1. Put the figures into a folder named {{{figures}}} and upload it to the foxridge server alongside the {{{pageimg}}}-folder of the respective book: folder {{{online_permanent/library/XXXXXXXX}}} will then contain both a {{{pageimg}}} and a {{{figures}}} folder.
     16 1. Save the URLs of the pages with the zoomed area into a text file. Also, note in the file which images have to be removed from the XML. The list should be stored somewhere for future reference (e. g. in case the figures are not produced by cutting the files, but by extracting the figures on the fly). The first list was generated for [http://it-dev.mpiwg-berlin.mpg.de:81/tracs/mpdl-project-content/attachment/wiki/WO3_Apian_1550/Apian_1550_PUBSU9QD_figure-coords.txt Apian 1550].
     17 1. On the basis of this text file, the Python script cut_images.py (attached to this page) takes care of cutting out the images from the original TIFF files and saves them in the desired format {{{page-imagenumber}}} (e. g., if pageimage {{{0056.tif}}} has three figures, these figures will be saved as {{{0056-01.tif}}}, {{{0056-02.tif}}} and {{{0056-03.tif}}}), by calling Imagemagick's commands {{{identify}}} and {{{convert}}}. They are stored in a folder called {{{figures}}}
     18 1. This folder is then uploaded to the foxridge server alongside the {{{pageimg}}}-folder of the respective book: folder {{{online_permanent/library/XXXXXXXX}}} will then contain both a {{{pageimg}}} and a {{{figures}}} folder.
     19 1. It should be made sure that the excluded images are removed from the XML file.