Changes between Version 2 and Version 3 of HarriotWorkflow


Ignore:
Timestamp:
Jul 22, 2015, 4:29:45 PM (9 years ago)
Author:
Klaus Thoden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HarriotWorkflow

    v2 v3  
    1 The scholars of the Harriot online project are working with an enhanced version of the ECHO XML schema which allows them to insert commentary and editorial remarks. These markings have in a second step to be converted to the regular ECHO schema. Also, the LaTeX shorthand for math has to be transformed. This whole process is handled mainly by three scripts
    2  - [source:/trunk/schema/scripts/Harriot/cleanURL.py cleanURL.py] replaces ampersands in URLs and also removes the {{{xsi:schemaLocation}}} from the header
    3  - [source:/digitizing-tools/scripts/mathml/mathml-wrapper.py mathml-wrapper.py] converts LaTeX math code to mathml.
    4  - [source:/trunk/schema/scripts/Harriot/adjustHarriot.xsl adjustHarriot.xsl] does the main work in converting the Harriot-specific markup into ECHO-conform elements.
    5 
    6 A shell function is quite convenient for dealing with all the scripts in a row (adjust paths to your needs):
     1[[PageOutline(1-4,,pullout)]]
     2
     3= XML Encoding =
     4The scholars of the Harriot online project are working with an enhanced version of the ECHO XML schema which allows them to insert commentary and editorial remarks. These markings have in a second step to be converted to the regular ECHO schema, before it can be uploaded and displayed in the ECHO display environment. Also, the LaTeX shorthand for math has to be transformed into MathML.
     5
     6== Editing ==
     7Scholars working on the Edition at the moment use either Emacs or XeMeL, an Eclipse-based XML editor with SVN client. We recommend the use of text editors that offer autocompletion in connection with the Schema which facilitates the editing. Attached are two version of that schema:
     8 * an XSD version  which can be used while editing texts in XeMeL. To make it work properly, the following line has to be added to the {{{echo}}}-Tag at the top of the document:
     9{{{
     10xsi:schemaLocation="http://www.mpiwg-berlin.mpg.de/ns/echo/1.0/ harriot_xsd/echo.xsd "
     11}}}
     12
     13 *  and an RNC version for use in e. g. Emacs
     14
     15More information is to found at [wiki:XMLDESpecs#ConnectiontotheECHOSchemaandAutocompletion the general page on XML editing].
     16
     17As for formulas, it possible to use LaTeX markup. There is a script that converts that code into MathML ([https://it-dev.mpiwg-berlin.mpg.de/svn/digitizing-tools/scripts/mathml/ link]).
     18
     19== Upgrading XeMeL ==
     20As of 2015, the repository uses Subversion 1.7 which means that also local working copies have to be upgraded (by issuing {{{svn upgrade}}} on the command line). But also existing copies of XeMeL would need to be updated. A new version is located at http://ocropus.mpiwg-berlin.mpg.de/~kthoden/XeMeL.zip. If you want to update the copy yourself, you need to download http://subclipse.tigris.org/files/documents/906/49280/site-1.8.22.zip and http://www.svnkit.com/org.tmatesoft.svn_1.7.14.src.zip and extracted in the root folder of XeMeL (so that things are written in the {{{plugins}}} directory and so on). Also, in the preferences of XeMeL, the correct SVN interface has to be chosen: SVNKit (!PureJava).
     21
     22== Macros ==
     23For both Eclipse and Emacs, templates have been created to quickly insert Commentaries and Translations. As this will later be fed into the annotation system, date and username are also inserted. Unfortunately, in Eclipse the date format is dependant on your machine's language settings. Both template files are in the [source:/trunk/texts/Harriot/Documents/ SVN repository].
     24
     25= Map Editing =
     26The maps are edited with the free (but not open source) editor [http://www.yworks.com/en/products_yed_about.html yEd]. The symbols for the maps are stored in a [source:/trunk/texts/Harriot/Maps/paletteManuscripts.graphml palette], which makes sure that the right symbols are used.
     27
     28= Bibliography =
     29Two bibliographies are kept, one for [source:/trunk/texts/Harriot/Documents/Bibliography.bib secondary literature], the other for [source:/trunk/texts/Harriot/Documents/Sources.bib source books] that Harriot used and referred to. Both of them are in the Biblatex format and a [http://echo.mpiwg-berlin.mpg.de/content/scientific_revolution/harriot/project_infos/harriot-bibliography HTML version] of them is included in the ECHO pages on this project.
     30
     31The conversion to HTML is done at the moment using [http://www.lri.fr/~filliatr/bibtex2html/ bibtex2html 1.96] with the following command:
     32{{{
     33./bibtex2html -o Export -nobibsource -unicode -a -i Sources.bib
     34}}}
     35= Storage of files =
     36As of September 2012, the project's files are also part of the [source:/trunk/texts/Harriot MPIWG-MPDL Content Project's repository]. This will make updating the repository or local copies much easier. The respective branch can be checked out by directing a Subversion client to {{{https://it-dev.mpiwg-berlin.mpg.de/svn/mpdl-project-content/trunk/texts/Harriot}}}. The structure of the repository remains the same:
     37 * [source:/trunk/texts/Harriot/Maps Maps] contains schematic maps of how the folios might be structured thematically
     38 * [source:/trunk/texts/Harriot/Transcripts Transcripts] contains transcriptions in XML files
     39 * [source:/trunk/texts/Harriot/Documents Documents] contains miscellaneous documents, e. g. typing conventions
     40
     41= Conversion workflow =
     42To facilitate the work of the scholars, they are free to use some shorthands for XML elements or type LaTeX style formula encoding. Before uploading, four scripts convert the XML into a version that can be uploaded via the [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/doc/doc-operation-exist.xql uploading interface]:
     43 1. [source:/trunk/schema/scripts/Harriot/cleanURL.py cleanURL.py] replaces ampersands in URLs and also removes the {{{xsi:schemaLocation}}} from the header
     44 1. [source:/schema/scripts/Harriot/bibtex_linker.py bibtex_linker.py] uses input from the Biblatex files (see above) and the Python library [https://bibtexparser.readthedocs.org/en/latest/index.html bibtexparser] to create links in the XML to the online resources or, if there is none to the bibliography on ECHO.
     45 1. [source:/digitizing-tools/scripts/mathml/mathml-wrapper.py mathml-wrapper.py] converts LaTeX math code into mathml.
     46 1. [source:/trunk/schema/scripts/Harriot/adjustHarriot.xsl adjustHarriot.xsl] does the main work in converting the Harriot-specific markup into ECHO-conform elements.
     47
     48A shell function is quite convenient for dealing with all the scripts in a row and checking the XML for validity in between (adjust paths to your needs):
    749{{{
    850#!zsh
    951harriot() {
    10         # echo comment
     52    export SCRIPTS="/Users/kthoden/ECHO_svn/schema/scripts/Harriot"
     53    export ECHO_SCRIPTS_DIR="/Users/kthoden/src/eclipse/projects/digitizing-tools/scripts"
     54    # echo comment
     55    echo ============= $* =============
    1156           if [[ -r $(echo "$*" | sed s/.xml/-adjusted.xml/ ) ]]; then
    1257            echo Removing previous conversion file
    1358            rm -v $(echo "$*" | sed s/.xml/-adjusted.xml/ )
    1459           fi
    15            echo Correcting urls in "$*"
    16            python cleanURL.py $* &&
     60           echo Correcting URLs
     61           python $SCRIPTS/cleanURL.py $* &&
    1762           echo Checking for wellformed XML
    1863           xmllint --noout 01_cleanedURL.xml &&
     64           echo Adding bibliography items &&
     65           python3 /Users/kthoden/ECHO_svn/schema/scripts/Harriot/bibtex_linker.py 01_cleanedURL.xml
    1966           echo Then math out put &&
    20            /opt/local/bin/python2.7 /Users/kthoden/eclipse/projects/digitizing-tools/scripts/mathml/mathml-wrapper.py --outputTextFile=02_mathConverted.xml --console=/tmp/console.txt 01_cleanedURL.xml &&
    21            echo Replacing things in "$*"
    22            java -jar /Users/kthoden/XML-ECHO-SVN/trunk/schema/thirdparty/saxonhe9-2-1-1j/saxon9he.jar -xsl:adjustHarriot.xsl -s:02_mathConverted.xml -o:$(echo $*| sed s/.xml/-adjusted.xml/g) &&
     67           python /Users/kthoden/src/eclipse/projects/digitizing-tools/scripts/mathml/mathml-wrapper.py --outputTextFile=02_mathConverted.xml --console=/tmp/console.txt 01_cleanedURL-bib.xml &&
     68           echo Doing more adjustments things with "$*"
     69           java -jar /Users/kthoden/ECHO_svn/schema/thirdparty/saxonhe9-2-1-1j/saxon9he.jar -xsl:$SCRIPTS/adjustHarriot.xsl -s:02_mathConverted.xml -o:$(echo $*| sed s/.xml/-adjusted.xml/g) &&
    2370           echo Removing temporary files   
    24            rm -v 01_cleanedURL.xml 02_mathConverted.xml &&
     71           rm -v 01_cleanedURL.xml 01_cleanedURL-bib.xml 02_mathConverted.xml &&
    2572           echo Is it valid?
    26            java -jar /Users/kthoden/XML-ECHO-SVN/trunk/schema/thirdparty/jing-20091111/bin/jing.jar -c /Users/kthoden/XML-ECHO-SVN/trunk/schema/schema/echo/echo.rnc $(echo $*| sed s/.xml/-adjusted.xml/g)
    27            echo Finished
     73           java -jar /Users/kthoden/ECHO_svn/schema/thirdparty/jing-20091111/bin/jing.jar -c /Users/kthoden/ECHO_svn/schema/schema/echo/echo.rnc $(echo $*| sed s/.xml/-adjusted.xml/g)
     74           echo Finis.
     75           echo ====================================
    2876}
    2977}}}
     
    3179The resulting files carry an {{{-adjusted}}}-infix and will then have to be moved and renamed to [source:/trunk/texts/eXist/echo/en] (for the time being) to then ingested into the ECHO system.
    3280
    33 == Additional scripts ==
    34 === html2pt ===
    35 to be documented
    36 === makeIndex ===
    37 to be documented
    38 === menuMaker ===
    39 to be documented
     81
     82= Map conversion workflow =
     83== Export to static maps ==
     84The maps can be exported to html and a [http://echo.mpiwg-berlin.mpg.de/content/scientific_revolution/harriot/maps/0_TOPICS.pt webpage] exists that displays all the maps and can be browsed.
     85
     86Export from yEd has to be done manually for each map (there is no mass exporter), but the settings remain stable per session. The required steps are:
     87 1. Set the export directory (remains stable per session)
     88 1. Export as HTML-Imagemap with the following settings
     89  1. Clipping: Default (leave settings as they are)
     90  1. Image: Choose PNG and Antialiasing
     91  1. HTML: uncheck both boxes (open link in new window, export description as tooltip) and replace the existing template with the following code (adapted to ECHO's Zope environment):
     92{{{
     93<html metal:use-macro="here/main_template/macros/page">
     94<head><title></title>
     95<!-- insert date here -->
     96<style type="text/css">
     97.tooltip {
     98  font-size:10pt;
     99  background-color:#FFFFCC;
     100  border:1px solid black;
     101  padding:2px
     102}
     103</style>
     104</head>
     105<body>
     106<span metal:fill-slot="body">
     107<a target="_blank" href="./index.pt">Index of topics</a>&nbsp;&nbsp;&nbsp;<a target="_blank" href="./legendManuscripts.pt">Legend</a>
     108
     109${DIAGRAM}
     110</span>
     111</body>
     112</html>
     113}}}
     114  1. The "Sources" section contains links to external sources, describing the persons (Wikipedia, DNB, VIAF &c). A little bit more code has to be kept in the page to make the little menu work that pops up when the name is being clicked:
     115{{{
     116<html metal:use-macro="here/main_template/macros/page">
     117  <head>
     118    <title>Empty Title</title>
     119<!-- insert date here -->
     120  </head>
     121  <body>
     122    <span metal:fill-slot="body">
     123      <style type="text/css">
     124        #authorMenu {display:none
     125        position:absolute;
     126        z-index:200; /* always on top*/
     127        padding-left: 35px;
     128        margin-left: 100px;
     129        margin-top: 12em;
     130        width: 250px;
     131        border: 2px solid rgba(128, 128, 128, 0.5);
     132        border-style: ridge;
     133        border-radius: 10px;
     134        background: rgba(128, 128, 128, 0.5);
     135        <!-- background-color: #777; -->
     136        color: white;
     137        font-size: 0.95em;
     138        }
     139      </style>
     140      <script type="text/javascript">
     141        function showElement(layer){
     142        var myLayer = document.getElementById(layer);
     143        if(myLayer.style.display=="none"){
     144        myLayer.style.display="block";
     145        myLayer.backgroundPosition="top";
     146        } else {
     147        myLayer.style.display="none";
     148        }
     149        }
     150      </script>
     151      <a target="_blank" href="./index.pt">Index of topics</a>&nbsp;&nbsp;&nbsp;<a target="_blank" href="./legendManuscripts.pt">Legend</a>
     152      <!-- insert menu here -->
     153      ${DIAGRAM}
     154    </span>
     155  </body>
     156</html>
     157
     158}}}
     159  1. Furthermore, a different script has to be executed on these maps. It also requires the persons maps to be exported apart from the others (but maybe it would not hurt to have the css and javascript snippets in the other maps as well)
     160  1. Tiling: Do not activate Tiling
     161 1. To export all the maps, it is best to open all the files, start to export the first one (thereby setting above settings) and close that one. After that the following key sequence can be used: {{{Cmd-E, Return, Return, Cmd-W}}}. This will export and close each map.
     162 1. The resulting html files have to be edited, because a link to another map (i. e. another graphml file) will retain the extension graphml in the source code. The python script [source:/trunk/schema/scripts/Harriot/html2pt.py html2pt] takes care of this replacement and renames the files to the extension {{{*.pt}}}. The script can be called in a loop {{{for i in *.html; do echo $i; python /Users/kthoden/XML-ECHO-SVN/trunk/schema/scripts/Harriot/html2pt.py $i;done}}} (adjust path).
     163 1. The persons maps also require another script, because so far the information about the external links is not included. Right now, it resides in the file [source:/trunk/texts/Harriot/Documents/persons.csv persons.csv] which is then evaluated by the script [source:/trunk/schema/scripts/Harriot/menuMaker.py menuMaker].
     164 1. The pt-files have to be copied to {{{tuxserve03:18021/echo_nav/echo_pages/content/scientific_revolution/harriot/maps}}}
     165
     166== Export to interactive maps ==
     167There is also an experimental representation of the maps as an RDF graph ([http://euler.mpiwg-berlin.mpg.de/LodLive/?http://example.org/harriotOnt/HarriotGraph Enter here]). As of now, it is only accessible inside the institute.
     168
     169The visualization is made with a tool called [http://blog.lodlive.it/ LodLive]), and it can also be queried by going to the [http://euler.mpiwg-berlin.mpg.de/LodLive/ query page]. Instructions how to use the tool are [https://it-dev.mpiwg-berlin.mpg.de/tracs/mpdl-project-content/raw-attachment/wiki/Harriot/using_lodlive.pdf attached to this page].
     170
     171A thorough documentation will soon be available on [http://intern.mpiwg-berlin.mpg.de/digitalhumanities/it-group-projects/harriot/dokumentation Drupal], you can see [https://it-dev.mpiwg-berlin.mpg.de/tracs/metadataprovider/browser/GraphML2RDF%20(PYTHON) the source code] for the conversion script as well as check it out from the Mercurial repository with the command
     172
     173{{{
     174hg clone https://it-dev.mpiwg-berlin.mpg.de/hg/graphML2RDF
     175}}}
     176
     177= Index of groups =
     178There is now also an index page, generated directly from the graphml-files. This is an alphabetical list of all the group headings. From inside the Maps directory, call the [source:/trunk/schema/scripts/Harriot/makeIndex.py makeIndex] script (at least Python 2.7 required) and copy the resulting {{{index.pt}}} also to {{{tuxserve03:18021/echo_nav/echo_pages/content/scientific_revolution/harriot/maps}}}.
     179
     180= Linked data =
     181The library's metadata was also used in the [http://dm2e.eu DM2E project] to transform it into RDF triples, for example
     182
     183{{{
     184<http://data.dm2e.eu/data/item/mpiwg/harriot/MPIWG:1SNN2GVM>
     185        a                      edm:ProvidedCHO ;
     186        dm2e:callNumber        "HMC 240 III" ;
     187        dm2e:levelOfHierarchy  1 ;
     188        dm2e:publishedAt       <http://data.dm2e.eu/data/place/mpiwg/harriot/MPIWG:1SNN2GVM/Petworth_UK> ;
     189        dc:description         "Scanned Document by the DIGIGROUP" ;
     190        dc:identifier          "MPIWG:1SNN2GVM" ;
     191        dc:language            "en" ;
     192        dc:title               "Harriot papers: 254-321 [HMC 240 III]" ;
     193        dc:type                dm2e:Manuscript ;
     194        pro:author             <http://data.dm2e.eu/data/agent/mpiwg/authority_gnd/118720473> ;
     195        void:inDataset         <http://data.dm2e.eu/data/dataset/mpiwg/harriot/20150113133045860> ;
     196        edm:type               "TEXT" .
     197}}}
     198
     199Which can also be displayed on the [http://data.dm2e.eu/data/html/resourcemap/mpiwg/harriot/MPIWG:1SNN2GVM/20150113133045860 DM2E datastore]
     200
     201It was then exported to [http://www.europeana.eu/portal/record/2048606/data_item_mpiwg_harriot_MPIWG_1SNN2GVM.html Europeana]
     202
     203Which again links back to the MPIWG view.
     204
     205== Exemplary editing with Pundit ==
     206On top of that, the [http://thepund.it/ Pundit annotation system] can be used to annotate the pages, using the system as described on [wiki:pundit elsewhere on this wiki]. The bookmarklets from the page http://it-dev.mpiwg-berlin.mpg.de/pundit2/build/texttoolbm.html have to be installed in the browser.
     207
     208Using a [http://mpdl-system.mpiwg-berlin.mpg.de:30060/mpdl/page-query-result.xql?document=%2Fecho%2Fen%2FReallyClean6782.xml&mode=text&pn=1 version of MS 6782] with all of the commentary removed, a [http://demo-cloud.ask.thepund.it/#/notebooks/802ea36e Pundit notebook] was started, inserting some existing annotations and using the representations of the pages in the datastore and the language technology to link Latin words to MPIWG's dictionaries. These annotations are also RDF triples.
     209
     210In the notebook, clicking on the plus in the bottom right corner gives you additional information on the annotation and also the link to jump to the page.
     211
     212Another visualization of the annotations is this graph in !LodLive: http://demo-lodlive.thepund.it/?http://purl.org/pundit/demo-cloud-server/user/c6daa448
     213
     214
     215