XML Encoding
The scholars of the Harriot online project are working with an enhanced version of the ECHO XML schema which allows them to insert commentary and editorial remarks. These markings have in a second step to be converted to the regular ECHO schema, before it can be uploaded and displayed in the ECHO display environment. Also, the LaTeX shorthand for math has to be transformed into MathML.
Editing
Scholars working on the Edition at the moment use either Emacs or XeMeL, an Eclipse-based XML editor with SVN client. We recommend the use of text editors that offer autocompletion in connection with the Schema which facilitates the editing. Attached are two version of that schema:
- an XSD version which can be used while editing texts in XeMeL. To make it work properly, the following line has to be added to the
echo
-Tag at the top of the document:xsi:schemaLocation="http://www.mpiwg-berlin.mpg.de/ns/echo/1.0/ harriot_xsd/echo.xsd "
- and an RNC version for use in e. g. Emacs
More information is to found at the general page on XML editing.
As for formulas, it possible to use LaTeX markup. There is a script that converts that code into MathML (link).
Upgrading XeMeL
As of 2015, the repository uses Subversion 1.7 which means that also local working copies have to be upgraded (by issuing svn upgrade
on the command line). But also existing copies of XeMeL would need to be updated. A new version is located at http://ocropus.mpiwg-berlin.mpg.de/~kthoden/XeMeL.zip. If you want to update the copy yourself, you need to download http://subclipse.tigris.org/files/documents/906/49280/site-1.8.22.zip and http://www.svnkit.com/org.tmatesoft.svn_1.7.14.src.zip and extracted in the root folder of XeMeL (so that things are written in the plugins
directory and so on). Also, in the preferences of XeMeL, the correct SVN interface has to be chosen: SVNKit (PureJava).
Macros
For both Eclipse and Emacs, templates have been created to quickly insert Commentaries and Translations. As this will later be fed into the annotation system, date and username are also inserted. Unfortunately, in Eclipse the date format is dependant on your machine's language settings. Both template files are in the SVN repository.
Map Editing
The maps are edited with the free (but not open source) editor yEd. The symbols for the maps are stored in a palette, which makes sure that the right symbols are used.
Bibliography
Two bibliographies are kept, one for secondary literature, the other for source books that Harriot used and referred to. Both of them are in the Biblatex format and a HTML version of them is included in the ECHO pages on this project.
The conversion to HTML is done at the moment using bibtex2html 1.96 with the following commands:
bibtex2html -o Sources -nobibsource -unicode -a -i Sources.bib bibtex2html -o Bib -nobibsource -unicode -a -i Bibliography.bib
Each resulting HTML document contains a table with the bibliographical entries. These tables are to be inserted into the following template
<h3><a name="sources">Sources used by Thomas Harriot<a></h3> <!-- Sources table goes here --> <hr/> <h3><a name="bib">Bibliography</a></h3> <!-- Bibliography table goes here-->
and then inserted into the Zope page found at http://echo.mpiwg-berlin.mpg.de/content/scientific_revolution/harriot/project_infos/harriot-bibliography
Storage of files
As of September 2012, the project's files are also part of the MPIWG-MPDL Content Project's repository. This will make updating the repository or local copies much easier. The respective branch can be checked out by directing a Subversion client to https://it-dev.mpiwg-berlin.mpg.de/svn/mpdl-project-content/trunk/texts/Harriot
. The structure of the repository remains the same:
- Maps contains schematic maps of how the folios might be structured thematically
- Transcripts contains transcriptions in XML files
- Documents contains miscellaneous documents, e. g. typing conventions
Conversion workflow
To facilitate the work of the scholars, they are free to use some shorthands for XML elements or type LaTeX style formula encoding. Before uploading, four scripts convert the XML into a version that can be uploaded via the uploading interface:
- cleanURL.py replaces ampersands in URLs and also removes the
xsi:schemaLocation
from the header - bibtex_linker.py uses input from the Biblatex files (see above) and the Python library bibtexparser to create links in the XML to the online resources or, if there is none to the bibliography on ECHO.
- mathml-wrapper.py converts LaTeX math code into mathml.
- adjustHarriot.xsl does the main work in converting the Harriot-specific markup into ECHO-conform elements.
A shell function is quite convenient for dealing with all the scripts in a row and checking the XML for validity in between (adjust paths to your needs):
harriot() { export SCRIPTS="/Users/kthoden/ECHO_svn/schema/scripts/Harriot" export ECHO_SCRIPTS_DIR="/Users/kthoden/src/eclipse/projects/digitizing-tools/scripts" # echo comment echo ============= $* ============= if [[ -r $(echo "$*" | sed s/.xml/-adjusted.xml/ ) ]]; then echo Removing previous conversion file rm -v $(echo "$*" | sed s/.xml/-adjusted.xml/ ) fi echo Correcting URLs python $SCRIPTS/cleanURL.py $* && echo Checking for wellformed XML xmllint --noout 01_cleanedURL.xml && echo Adding bibliography items && python3 /Users/kthoden/ECHO_svn/schema/scripts/Harriot/bibtex_linker.py 01_cleanedURL.xml echo Then math out put && python /Users/kthoden/src/eclipse/projects/digitizing-tools/scripts/mathml/mathml-wrapper.py --outputTextFile=02_mathConverted.xml --console=/tmp/console.txt 01_cleanedURL-bib.xml && echo Doing more adjustments things with "$*" java -jar /Users/kthoden/ECHO_svn/schema/thirdparty/saxonhe9-2-1-1j/saxon9he.jar -xsl:$SCRIPTS/adjustHarriot.xsl -s:02_mathConverted.xml -o:$(echo $*| sed s/.xml/-adjusted.xml/g) && echo Removing temporary files rm -v 01_cleanedURL.xml 01_cleanedURL-bib.xml 02_mathConverted.xml && echo Is it valid? java -jar /Users/kthoden/ECHO_svn/schema/thirdparty/jing-20091111/bin/jing.jar -c /Users/kthoden/ECHO_svn/schema/schema/echo/echo.rnc $(echo $*| sed s/.xml/-adjusted.xml/g) echo Finis. echo ==================================== }
The resulting files carry an -adjusted
-infix and will then have to be moved and renamed to trunk/texts/eXist/echo/en (for the time being) to then ingested into the ECHO system.
Map conversion workflow
Export to static maps
The maps can be exported to html and a webpage exists that displays all the maps and can be browsed.
Export from yEd has to be done manually for each map (there is no mass exporter), but the settings remain stable per session. The required steps are:
- Set the export directory (remains stable per session)
- Export as HTML-Imagemap with the following settings
- Clipping: Default (leave settings as they are)
- Image: Choose PNG and Antialiasing
- HTML: uncheck both boxes (open link in new window, export description as tooltip) and replace the existing template with the following code (adapted to ECHO's Zope environment):
<html metal:use-macro="here/main_template/macros/page"> <head><title></title> <!-- insert date here --> <style type="text/css"> .tooltip { font-size:10pt; background-color:#FFFFCC; border:1px solid black; padding:2px } </style> </head> <body> <span metal:fill-slot="body"> <a target="_blank" href="./index.pt">Index of topics</a> <a target="_blank" href="./legendManuscripts.pt">Legend</a> ${DIAGRAM} </span> </body> </html>
- The "Sources" section contains links to external sources, describing the persons (Wikipedia, DNB, VIAF &c). A little bit more code has to be kept in the page to make the little menu work that pops up when the name is being clicked:
<html metal:use-macro="here/main_template/macros/page"> <head> <title>Empty Title</title> <!-- insert date here --> </head> <body> <span metal:fill-slot="body"> <style type="text/css"> #authorMenu {display:none position:absolute; z-index:200; /* always on top*/ padding-left: 35px; margin-left: 100px; margin-top: 12em; width: 250px; border: 2px solid rgba(128, 128, 128, 0.5); border-style: ridge; border-radius: 10px; background: rgba(128, 128, 128, 0.5); <!-- background-color: #777; --> color: white; font-size: 0.95em; } </style> <script type="text/javascript"> function showElement(layer){ var myLayer = document.getElementById(layer); if(myLayer.style.display=="none"){ myLayer.style.display="block"; myLayer.backgroundPosition="top"; } else { myLayer.style.display="none"; } } </script> <a target="_blank" href="./index.pt">Index of topics</a> <a target="_blank" href="./legendManuscripts.pt">Legend</a> <!-- insert menu here --> ${DIAGRAM} </span> </body> </html>
- Furthermore, a different script has to be executed on these maps. It also requires the persons maps to be exported apart from the others (but maybe it would not hurt to have the css and javascript snippets in the other maps as well)
- Tiling: Do not activate Tiling
- To export all the maps, it is best to open all the files, start to export the first one (thereby setting above settings) and close that one. After that the following key sequence can be used:
Cmd-E, Return, Return, Cmd-W
. This will export and close each map. - The resulting html files have to be edited, because a link to another map (i. e. another graphml file) will retain the extension graphml in the source code. The python script html2pt takes care of this replacement and renames the files to the extension
*.pt
. The script can be called in a loopfor i in *.html; do echo $i; python /Users/kthoden/XML-ECHO-SVN/trunk/schema/scripts/Harriot/html2pt.py $i;done
(adjust path). - The persons maps also require another script, because so far the information about the external links is not included. Right now, it resides in the file persons.csv which is then evaluated by the script menuMaker.
- The pt-files have to be copied to
tuxserve03:18021/echo_nav/echo_pages/content/scientific_revolution/harriot/maps
Export to interactive maps
There is also an experimental representation of the maps as an RDF graph (Enter here). As of now, it is only accessible inside the institute.
The visualization is made with a tool called LodLive), and it can also be queried by going to the query page. Instructions how to use the tool are attached to this page.
A thorough documentation will soon be available on Drupal, you can see the source code for the conversion script as well as check it out from the Mercurial repository with the command
hg clone https://it-dev.mpiwg-berlin.mpg.de/hg/graphML2RDF
Index of groups
There is now also an index page, generated directly from the graphml-files. This is an alphabetical list of all the group headings. From inside the Maps directory, call the makeIndex script (at least Python 2.7 required) and copy the resulting index.pt
also to tuxserve03:18021/echo_nav/echo_pages/content/scientific_revolution/harriot/maps
.
Linked data
The library's metadata was also used in the DM2E project to transform it into RDF triples, for example
<http://data.dm2e.eu/data/item/mpiwg/harriot/MPIWG:1SNN2GVM> a edm:ProvidedCHO ; dm2e:callNumber "HMC 240 III" ; dm2e:levelOfHierarchy 1 ; dm2e:publishedAt <http://data.dm2e.eu/data/place/mpiwg/harriot/MPIWG:1SNN2GVM/Petworth_UK> ; dc:description "Scanned Document by the DIGIGROUP" ; dc:identifier "MPIWG:1SNN2GVM" ; dc:language "en" ; dc:title "Harriot papers: 254-321 [HMC 240 III]" ; dc:type dm2e:Manuscript ; pro:author <http://data.dm2e.eu/data/agent/mpiwg/authority_gnd/118720473> ; void:inDataset <http://data.dm2e.eu/data/dataset/mpiwg/harriot/20150113133045860> ; edm:type "TEXT" .
Which can also be displayed on the DM2E datastore
It was then exported to Europeana
Which again links back to the MPIWG view.
Exemplary editing with Pundit
On top of that, the Pundit annotation system can be used to annotate the pages, using the system as described on elsewhere on this wiki. The bookmarklets from the page http://it-dev.mpiwg-berlin.mpg.de/pundit2/build/texttoolbm.html have to be installed in the browser.
Using a version of MS 6782 with all of the commentary removed, a Pundit notebook was started, inserting some existing annotations and using the representations of the pages in the datastore and the language technology to link Latin words to MPIWG's dictionaries. These annotations are also RDF triples.
In the notebook, clicking on the plus in the bottom right corner gives you additional information on the annotation and also the link to jump to the page.
Another visualization of the annotations is this graph in LodLive: http://demo-lodlive.thepund.it/?http://purl.org/pundit/demo-cloud-server/user/c6daa448
Attachments (3)
-
using_lodlive.pdf (122.8 KB) - added by 10 years ago.
Using LodLive?
-
harriot_xsd.zip (29.4 KB) - added by 10 years ago.
XSD Schema for Harriot
-
harriot_rnc.zip (32.5 KB) - added by 10 years ago.
RNC Schema for Harriot
Download all attachments as: .zip