Changes between Initial Version and Version 1 of HarriotWorkflow


Ignore:
Timestamp:
Oct 16, 2013, 4:45:38 PM (11 years ago)
Author:
Klaus Thoden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HarriotWorkflow

    v1 v1  
     1The scholars of the Harriot online project are working with an enhanced version of the ECHO XML schema which allows them to insert commentary and editorial remarks. These markings have in a second step to be converted to the regular ECHO schema. Also, the LaTeX shorthand for math has to be transformed. This whole process is handled mainly by three scripts
     2 - [source:/trunk/schema/scripts/Harriot/cleanURL.py cleanURL.py] replaces ampersands in URLs and also removes the {{{xsi:schemaLocation}}} from the header
     3 - [source:/digitizing-tools/scripts/mathml/mathml-wrapper.py mathml-wrapper.py] converts LaTeX math code to mathml.
     4 - [source:/trunk/schema/scripts/Harriot/adjustHarriot.xsl adjustHarriot.xsl] does the main work in converting the Harriot-specific markup into ECHO-conform elements.
     5
     6A shell function is quite convenient for dealing with all the scripts in a row (adjust paths to your needs):
     7{{{
     8#!zsh
     9harriot() {
     10        # echo comment
     11           if [[ -r $(echo "$*" | sed s/.xml/-adjusted.xml/ ) ]]; then
     12            echo Removing previous conversion file
     13            rm -v $(echo "$*" | sed s/.xml/-adjusted.xml/ )
     14           fi
     15           echo Correcting urls in "$*"
     16           python cleanURL.py $* &&
     17           echo Checking for wellformed XML
     18           xmllint --noout 01_cleanedURL.xml &&
     19           echo Then math out put &&
     20           /opt/local/bin/python2.7 /Users/kthoden/eclipse/projects/digitizing-tools/scripts/mathml/mathml-wrapper.py --outputTextFile=02_mathConverted.xml --console=/tmp/console.txt 01_cleanedURL.xml &&
     21           echo Replacing things in "$*"
     22           java -jar /Users/kthoden/XML-ECHO-SVN/trunk/schema/thirdparty/saxonhe9-2-1-1j/saxon9he.jar -xsl:adjustHarriot.xsl -s:02_mathConverted.xml -o:$(echo $*| sed s/.xml/-adjusted.xml/g) &&
     23           echo Removing temporary files   
     24           rm -v 01_cleanedURL.xml 02_mathConverted.xml &&
     25           echo Is it valid?
     26           java -jar /Users/kthoden/XML-ECHO-SVN/trunk/schema/thirdparty/jing-20091111/bin/jing.jar -c /Users/kthoden/XML-ECHO-SVN/trunk/schema/schema/echo/echo.rnc $(echo $*| sed s/.xml/-adjusted.xml/g)
     27           echo Finished
     28}
     29}}}
     30
     31The resulting files carry an {{{-adjusted}}}-infix and will then have to be moved and renamed to [source:/trunk/texts/eXist/echo/en] (for the time being) to then ingested into the ECHO system.