Changes between Version 13 and Version 14 of Cutting out images


Ignore:
Timestamp:
Dec 16, 2010, 1:08:31 PM (13 years ago)
Author:
Klaus Thoden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Cutting out images

    v13 v14  
    1717 1. Save the URLs of the pages with the zoomed area into a text file (keyboard shortcuts come in handy here: {{{cmd-l-c-w}}} copies the link in the address bar and closes the tab, {{{cmd-TAB}}} switches to a text editor, {{{cmd-v}}} inserts the link. To note in the file which images have to be removed from the XML, copy the URL to the text file, but insert a {{{#}}} before that. You can also write other comments into this file, but be sure to begin the line with a {{{#}}}. The resulting list is to be saved in a new directory on the same level as the {{{raw}}} and the {{{xml}}} directory (see [source:/trunk/texts/WO_1/Stevin_1605] as an example). When trained, the average speed for cutting out figures is 2.5 figures per minute (completed Stevin_1605 in 2 hours)
    1818 1. On the basis of this text file, the Python script [source:/trunk/schema/scripts/cut_figures/cut_figures.py cut_figures.py] takes care of cutting out the images from the original TIFF files and saves them in the desired format {{{page-imagenumber}}} (e. g., if pageimage {{{0056.tif}}} has three figures, these figures will be saved as {{{0056-01.tif}}}, {{{0056-02.tif}}} and {{{0056-03.tif}}}), by calling Imagemagick's commands {{{identify}}} and {{{convert}}}. They are stored in a folder called {{{figures}}}
     19  1. Following error might occur
     20{{{
     21convert: AnErrorHasOccurredReadingFromFile `/Volumes/online_permanent/archimedes_repository/large/stevi_stati_527_la_1605/527-01-pageimg/527.01.139.jpg': Bad file descriptor @ constitute.c/ReadImage/575.
     22convert: missing an image filename `/Volumes/online_permanent/archimedes_repository/large/stevi_stati_527_la_1605/figures/527.01.139-02.jpg' @ convert.c/ConvertImageCommand/2775.
     23}}}
     24  1. In that case, rename the directory on Foxridge, run the script once more and merge the directories. Do this until the output of the script about how many figures it extracted and the number of files in the figures directory are the same.
    1925 1. It should be made sure that the excluded images are removed both from the raw text and the XML file.
    2026