wiki:HarriotWorkflow

XML Encoding

The scholars of the Harriot online project are working with an enhanced version of the ECHO XML schema which allows them to insert commentary and editorial remarks. These markings have in a second step to be converted to the regular ECHO schema, before it can be uploaded and displayed in the ECHO display environment. Also, the LaTeX shorthand for math has to be transformed into MathML.

Editing

Scholars working on the Edition at the moment use either Emacs or XeMeL, an Eclipse-based XML editor with SVN client. We recommend the use of text editors that offer autocompletion in connection with the Schema which facilitates the editing. Attached are two version of that schema:

  • an XSD version which can be used while editing texts in XeMeL. To make it work properly, the following line has to be added to the echo-Tag at the top of the document:
    xsi:schemaLocation="http://www.mpiwg-berlin.mpg.de/ns/echo/1.0/ harriot_xsd/echo.xsd "
    
  • and an RNC version for use in e. g. Emacs

More information is to found at the general page on XML editing.

As for formulas, it possible to use LaTeX markup. There is a script that converts that code into MathML (link).

Upgrading XeMeL

As of 2015, the repository uses Subversion 1.7 which means that also local working copies have to be upgraded (by issuing svn upgrade on the command line). But also existing copies of XeMeL would need to be updated. A new version is located at http://ocropus.mpiwg-berlin.mpg.de/~kthoden/XeMeL.zip. If you want to update the copy yourself, you need to download http://subclipse.tigris.org/files/documents/906/49280/site-1.8.22.zip and http://www.svnkit.com/org.tmatesoft.svn_1.7.14.src.zip and extracted in the root folder of XeMeL (so that things are written in the plugins directory and so on). Also, in the preferences of XeMeL, the correct SVN interface has to be chosen: SVNKit (PureJava).

Macros

For both Eclipse and Emacs, templates have been created to quickly insert Commentaries and Translations. As this will later be fed into the annotation system, date and username are also inserted. Unfortunately, in Eclipse the date format is dependant on your machine's language settings. Both template files are in the SVN repository.

Map Editing

The maps are edited with the free (but not open source) editor yEd. The symbols for the maps are stored in a palette, which makes sure that the right symbols are used.

Bibliography

Two bibliographies are kept, one for secondary literature, the other for source books that Harriot used and referred to. Both of them are in the Biblatex format and a HTML version of them is included in the ECHO pages on this project.

The conversion to HTML is done at the moment using bibtex2html 1.96 with the following commands:

bibtex2html -o Sources -nobibsource -unicode -a -i Sources.bib
bibtex2html -o Bib -nobibsource -unicode -a -i Bibliography.bib

Each resulting HTML document contains a table with the bibliographical entries. These tables are to be inserted into the following template

<h3><a name="sources">Sources used by Thomas Harriot<a></h3>
<!-- Sources table goes here -->
<hr/>
<h3><a name="bib">Bibliography</a></h3>
<!-- Bibliography table goes here-->

and then inserted into the Zope page found at http://echo.mpiwg-berlin.mpg.de/content/scientific_revolution/harriot/project_infos/harriot-bibliography

Storage of files

As of September 2012, the project's files are also part of the MPIWG-MPDL Content Project's repository. This will make updating the repository or local copies much easier. The respective branch can be checked out by directing a Subversion client to https://it-dev.mpiwg-berlin.mpg.de/svn/mpdl-project-content/trunk/texts/Harriot. The structure of the repository remains the same:

  • Maps contains schematic maps of how the folios might be structured thematically
  • Transcripts contains transcriptions in XML files
  • Documents contains miscellaneous documents, e. g. typing conventions

Conversion workflow

To facilitate the work of the scholars, they are free to use some shorthands for XML elements or type LaTeX style formula encoding. Before uploading, four scripts convert the XML into a version that can be uploaded via the uploading interface:

  1. cleanURL.py replaces ampersands in URLs and also removes the xsi:schemaLocation from the header
  2. bibtex_linker.py uses input from the Biblatex files (see above) and the Python library bibtexparser to create links in the XML to the online resources or, if there is none to the bibliography on ECHO.
  3. mathml-wrapper.py converts LaTeX math code into mathml.
  4. adjustHarriot.xsl does the main work in converting the Harriot-specific markup into ECHO-conform elements.

A shell function is quite convenient for dealing with all the scripts in a row and checking the XML for validity in between (adjust paths to your needs):

harriot() {
    export SCRIPTS="/Users/kthoden/ECHO_svn/schema/scripts/Harriot"
    export ECHO_SCRIPTS_DIR="/Users/kthoden/src/eclipse/projects/digitizing-tools/scripts"
    # echo comment
    echo ============= $* =============
           if [[ -r $(echo "$*" | sed s/.xml/-adjusted.xml/ ) ]]; then
            echo Removing previous conversion file
            rm -v $(echo "$*" | sed s/.xml/-adjusted.xml/ )
           fi
           echo Correcting URLs
           python $SCRIPTS/cleanURL.py $* &&
           echo Checking for wellformed XML
           xmllint --noout 01_cleanedURL.xml &&
           echo Adding bibliography items &&
           python3 /Users/kthoden/ECHO_svn/schema/scripts/Harriot/bibtex_linker.py 01_cleanedURL.xml
           echo Then math out put &&
           python /Users/kthoden/src/eclipse/projects/digitizing-tools/scripts/mathml/mathml-wrapper.py --outputTextFile=02_mathConverted.xml --console=/tmp/console.txt 01_cleanedURL-bib.xml &&
           echo Doing more adjustments things with "$*"
           java -jar /Users/kthoden/ECHO_svn/schema/thirdparty/saxonhe9-2-1-1j/saxon9he.jar -xsl:$SCRIPTS/adjustHarriot.xsl -s:02_mathConverted.xml -o:$(echo $*| sed s/.xml/-adjusted.xml/g) &&
           echo Removing temporary files    
           rm -v 01_cleanedURL.xml 01_cleanedURL-bib.xml 02_mathConverted.xml  &&
           echo Is it valid?
           java -jar /Users/kthoden/ECHO_svn/schema/thirdparty/jing-20091111/bin/jing.jar -c /Users/kthoden/ECHO_svn/schema/schema/echo/echo.rnc $(echo $*| sed s/.xml/-adjusted.xml/g)
           echo Finis.
           echo ====================================
}

The resulting files carry an -adjusted-infix and will then have to be moved and renamed to trunk/texts/eXist/echo/en (for the time being) to then ingested into the ECHO system.

Map conversion workflow

Export to static maps

The maps can be exported to html and a webpage exists that displays all the maps and can be browsed.

Export from yEd has to be done manually for each map (there is no mass exporter), but the settings remain stable per session. The required steps are:

  1. Set the export directory (remains stable per session)
  2. Export as HTML-Imagemap with the following settings
    1. Clipping: Default (leave settings as they are)
    2. Image: Choose PNG and Antialiasing
    3. HTML: uncheck both boxes (open link in new window, export description as tooltip) and replace the existing template with the following code (adapted to ECHO's Zope environment):
      <html metal:use-macro="here/main_template/macros/page">
      <head><title></title>
      <!-- insert date here -->
      <style type="text/css">
      .tooltip {
        font-size:10pt;
        background-color:#FFFFCC;
        border:1px solid black;
        padding:2px
      }
      </style>
      </head>
      <body>
      <span metal:fill-slot="body">
      <a target="_blank" href="./index.pt">Index of topics</a>&nbsp;&nbsp;&nbsp;<a target="_blank" href="./legendManuscripts.pt">Legend</a>
      
      ${DIAGRAM}
      </span>
      </body>
      </html>
      
    4. The "Sources" section contains links to external sources, describing the persons (Wikipedia, DNB, VIAF &c). A little bit more code has to be kept in the page to make the little menu work that pops up when the name is being clicked:
      <html metal:use-macro="here/main_template/macros/page">
        <head>
          <title>Empty Title</title>
      <!-- insert date here -->
        </head>
        <body>
          <span metal:fill-slot="body">
            <style type="text/css">
              #authorMenu {display:none
              position:absolute;
              z-index:200; /* always on top*/
              padding-left: 35px;
              margin-left: 100px;
              margin-top: 12em;
              width: 250px;
              border: 2px solid rgba(128, 128, 128, 0.5);
              border-style: ridge;
              border-radius: 10px;
              background: rgba(128, 128, 128, 0.5);
              <!-- background-color: #777; -->
              color: white;
              font-size: 0.95em;
              }
            </style>
            <script type="text/javascript">
              function showElement(layer){
              var myLayer = document.getElementById(layer);
              if(myLayer.style.display=="none"){
              myLayer.style.display="block";
              myLayer.backgroundPosition="top";
              } else {
              myLayer.style.display="none";
              }
              }
            </script>
            <a target="_blank" href="./index.pt">Index of topics</a>&nbsp;&nbsp;&nbsp;<a target="_blank" href="./legendManuscripts.pt">Legend</a>
            <!-- insert menu here -->
            ${DIAGRAM}
          </span>
        </body>
      </html>
      
      
    5. Furthermore, a different script has to be executed on these maps. It also requires the persons maps to be exported apart from the others (but maybe it would not hurt to have the css and javascript snippets in the other maps as well)
    6. Tiling: Do not activate Tiling
  3. To export all the maps, it is best to open all the files, start to export the first one (thereby setting above settings) and close that one. After that the following key sequence can be used: Cmd-E, Return, Return, Cmd-W. This will export and close each map.
  4. The resulting html files have to be edited, because a link to another map (i. e. another graphml file) will retain the extension graphml in the source code. The python script html2pt takes care of this replacement and renames the files to the extension *.pt. The script can be called in a loop for i in *.html; do echo $i; python /Users/kthoden/XML-ECHO-SVN/trunk/schema/scripts/Harriot/html2pt.py $i;done (adjust path).
  5. The persons maps also require another script, because so far the information about the external links is not included. Right now, it resides in the file persons.csv which is then evaluated by the script menuMaker.
  6. The pt-files have to be copied to tuxserve03:18021/echo_nav/echo_pages/content/scientific_revolution/harriot/maps

Export to interactive maps

There is also an experimental representation of the maps as an RDF graph (Enter here). As of now, it is only accessible inside the institute.

The visualization is made with a tool called LodLive), and it can also be queried by going to the query page. Instructions how to use the tool are attached to this page.

A thorough documentation will soon be available on Drupal, you can see the source code for the conversion script as well as check it out from the Mercurial repository with the command

hg clone https://it-dev.mpiwg-berlin.mpg.de/hg/graphML2RDF

Index of groups

There is now also an index page, generated directly from the graphml-files. This is an alphabetical list of all the group headings. From inside the Maps directory, call the makeIndex script (at least Python 2.7 required) and copy the resulting index.pt also to tuxserve03:18021/echo_nav/echo_pages/content/scientific_revolution/harriot/maps.

Linked data

The library's metadata was also used in the DM2E project to transform it into RDF triples, for example

<http://data.dm2e.eu/data/item/mpiwg/harriot/MPIWG:1SNN2GVM>
        a                      edm:ProvidedCHO ;
        dm2e:callNumber        "HMC 240 III" ;
        dm2e:levelOfHierarchy  1 ;
        dm2e:publishedAt       <http://data.dm2e.eu/data/place/mpiwg/harriot/MPIWG:1SNN2GVM/Petworth_UK> ;
        dc:description         "Scanned Document by the DIGIGROUP" ;
        dc:identifier          "MPIWG:1SNN2GVM" ;
        dc:language            "en" ;
        dc:title               "Harriot papers: 254-321 [HMC 240 III]" ;
        dc:type                dm2e:Manuscript ;
        pro:author             <http://data.dm2e.eu/data/agent/mpiwg/authority_gnd/118720473> ;
        void:inDataset         <http://data.dm2e.eu/data/dataset/mpiwg/harriot/20150113133045860> ;
        edm:type               "TEXT" .

Which can also be displayed on the DM2E datastore

It was then exported to Europeana

Which again links back to the MPIWG view.

Exemplary editing with Pundit

On top of that, the Pundit annotation system can be used to annotate the pages, using the system as described on elsewhere on this wiki. The bookmarklets from the page http://it-dev.mpiwg-berlin.mpg.de/pundit2/build/texttoolbm.html have to be installed in the browser.

Using a version of MS 6782 with all of the commentary removed, a Pundit notebook was started, inserting some existing annotations and using the representations of the pages in the datastore and the language technology to link Latin words to MPIWG's dictionaries. These annotations are also RDF triples.

In the notebook, clicking on the plus in the bottom right corner gives you additional information on the annotation and also the link to jump to the page.

Another visualization of the annotations is this graph in LodLive: http://demo-lodlive.thepund.it/?http://purl.org/pundit/demo-cloud-server/user/c6daa448

Last modified 9 years ago Last modified on Aug 19, 2015, 3:27:06 PM

Attachments (3)

Download all attachments as: .zip