[[PageOutline(1-4,,pullout)]] = XML Encoding = The scholars of the Harriot online project are working with an enhanced version of the ECHO XML schema which allows them to insert commentary and editorial remarks. These markings have in a second step to be converted to the regular ECHO schema, before it can be uploaded and displayed in the ECHO display environment. Also, the LaTeX shorthand for math has to be transformed into MathML. == Editing == Scholars working on the Edition at the moment use either Emacs or XeMeL, an Eclipse-based XML editor with SVN client. We recommend the use of text editors that offer autocompletion in connection with the Schema which facilitates the editing. Attached are two version of that schema: * an XSD version which can be used while editing texts in XeMeL. To make it work properly, the following line has to be added to the {{{echo}}}-Tag at the top of the document: {{{ xsi:schemaLocation="http://www.mpiwg-berlin.mpg.de/ns/echo/1.0/ harriot_xsd/echo.xsd " }}} * and an RNC version for use in e. g. Emacs More information is to found at [wiki:XMLDESpecs#ConnectiontotheECHOSchemaandAutocompletion the general page on XML editing]. As for formulas, it possible to use LaTeX markup. There is a script that converts that code into MathML ([https://it-dev.mpiwg-berlin.mpg.de/svn/digitizing-tools/scripts/mathml/ link]). == Upgrading XeMeL == As of 2015, the repository uses Subversion 1.7 which means that also local working copies have to be upgraded (by issuing {{{svn upgrade}}} on the command line). But also existing copies of XeMeL would need to be updated. A new version is located at http://ocropus.mpiwg-berlin.mpg.de/~kthoden/XeMeL.zip. If you want to update the copy yourself, you need to download http://subclipse.tigris.org/files/documents/906/49280/site-1.8.22.zip and http://www.svnkit.com/org.tmatesoft.svn_1.7.14.src.zip and extracted in the root folder of XeMeL (so that things are written in the {{{plugins}}} directory and so on). Also, in the preferences of XeMeL, the correct SVN interface has to be chosen: SVNKit (!PureJava). == Macros == For both Eclipse and Emacs, templates have been created to quickly insert Commentaries and Translations. As this will later be fed into the annotation system, date and username are also inserted. Unfortunately, in Eclipse the date format is dependant on your machine's language settings. Both template files are in the [source:/trunk/texts/Harriot/Documents/ SVN repository]. = Map Editing = The maps are edited with the free (but not open source) editor [http://www.yworks.com/en/products_yed_about.html yEd]. The symbols for the maps are stored in a [source:/trunk/texts/Harriot/Maps/paletteManuscripts.graphml palette], which makes sure that the right symbols are used. = Bibliography = Two bibliographies are kept, one for [source:/trunk/texts/Harriot/Documents/Bibliography.bib secondary literature], the other for [source:/trunk/texts/Harriot/Documents/Sources.bib source books] that Harriot used and referred to. Both of them are in the Biblatex format and a [http://echo.mpiwg-berlin.mpg.de/content/scientific_revolution/harriot/project_infos/harriot-bibliography HTML version] of them is included in the ECHO pages on this project. The conversion to HTML is done at the moment using [http://www.lri.fr/~filliatr/bibtex2html/ bibtex2html 1.96] with the following commands: {{{ bibtex2html -o Sources -nobibsource -unicode -a -i Sources.bib bibtex2html -o Bib -nobibsource -unicode -a -i Bibliography.bib }}} Each resulting HTML document contains a table with the bibliographical entries. These tables are to be inserted into the following template {{{

Sources used by Thomas Harriot


Bibliography

}}} and then inserted into the Zope page found at http://echo.mpiwg-berlin.mpg.de/content/scientific_revolution/harriot/project_infos/harriot-bibliography = Storage of files = As of September 2012, the project's files are also part of the [source:/trunk/texts/Harriot MPIWG-MPDL Content Project's repository]. This will make updating the repository or local copies much easier. The respective branch can be checked out by directing a Subversion client to {{{https://it-dev.mpiwg-berlin.mpg.de/svn/mpdl-project-content/trunk/texts/Harriot}}}. The structure of the repository remains the same: * [source:/trunk/texts/Harriot/Maps Maps] contains schematic maps of how the folios might be structured thematically * [source:/trunk/texts/Harriot/Transcripts Transcripts] contains transcriptions in XML files * [source:/trunk/texts/Harriot/Documents Documents] contains miscellaneous documents, e. g. typing conventions = Conversion workflow = To facilitate the work of the scholars, they are free to use some shorthands for XML elements or type LaTeX style formula encoding. Before uploading, four scripts convert the XML into a version that can be uploaded via the [http://mpdl-proto.mpiwg-berlin.mpg.de/mpdl/doc/doc-operation-exist.xql uploading interface]: 1. [source:/trunk/schema/scripts/Harriot/cleanURL.py cleanURL.py] replaces ampersands in URLs and also removes the {{{xsi:schemaLocation}}} from the header 1. [source:/trunk/schema/scripts/Harriot/bibtex_linker.py bibtex_linker.py] uses input from the Biblatex files (see above) and the Python library [https://bibtexparser.readthedocs.org/en/latest/index.html bibtexparser] to create links in the XML to the online resources or, if there is none to the bibliography on ECHO. 1. [source:/digitizing-tools/scripts/mathml/mathml-wrapper.py mathml-wrapper.py] converts LaTeX math code into mathml. 1. [source:/trunk/schema/scripts/Harriot/adjustHarriot.xsl adjustHarriot.xsl] does the main work in converting the Harriot-specific markup into ECHO-conform elements. A shell function is quite convenient for dealing with all the scripts in a row and checking the XML for validity in between (adjust paths to your needs): {{{ #!zsh harriot() { export SCRIPTS="/Users/kthoden/ECHO_svn/schema/scripts/Harriot" export ECHO_SCRIPTS_DIR="/Users/kthoden/src/eclipse/projects/digitizing-tools/scripts" # echo comment echo ============= $* ============= if [[ -r $(echo "$*" | sed s/.xml/-adjusted.xml/ ) ]]; then echo Removing previous conversion file rm -v $(echo "$*" | sed s/.xml/-adjusted.xml/ ) fi echo Correcting URLs python $SCRIPTS/cleanURL.py $* && echo Checking for wellformed XML xmllint --noout 01_cleanedURL.xml && echo Adding bibliography items && python3 /Users/kthoden/ECHO_svn/schema/scripts/Harriot/bibtex_linker.py 01_cleanedURL.xml echo Then math out put && python /Users/kthoden/src/eclipse/projects/digitizing-tools/scripts/mathml/mathml-wrapper.py --outputTextFile=02_mathConverted.xml --console=/tmp/console.txt 01_cleanedURL-bib.xml && echo Doing more adjustments things with "$*" java -jar /Users/kthoden/ECHO_svn/schema/thirdparty/saxonhe9-2-1-1j/saxon9he.jar -xsl:$SCRIPTS/adjustHarriot.xsl -s:02_mathConverted.xml -o:$(echo $*| sed s/.xml/-adjusted.xml/g) && echo Removing temporary files rm -v 01_cleanedURL.xml 01_cleanedURL-bib.xml 02_mathConverted.xml && echo Is it valid? java -jar /Users/kthoden/ECHO_svn/schema/thirdparty/jing-20091111/bin/jing.jar -c /Users/kthoden/ECHO_svn/schema/schema/echo/echo.rnc $(echo $*| sed s/.xml/-adjusted.xml/g) echo Finis. echo ==================================== } }}} The resulting files carry an {{{-adjusted}}}-infix and will then have to be moved and renamed to [source:/trunk/texts/eXist/echo/en] (for the time being) to then ingested into the ECHO system. = Map conversion workflow = == Export to static maps == The maps can be exported to html and a [http://echo.mpiwg-berlin.mpg.de/content/scientific_revolution/harriot/maps/0_TOPICS.pt webpage] exists that displays all the maps and can be browsed. Export from yEd has to be done manually for each map (there is no mass exporter), but the settings remain stable per session. The required steps are: 1. Set the export directory (remains stable per session) 1. Export as HTML-Imagemap with the following settings 1. Clipping: Default (leave settings as they are) 1. Image: Choose PNG and Antialiasing 1. HTML: uncheck both boxes (open link in new window, export description as tooltip) and replace the existing template with the following code (adapted to ECHO's Zope environment): {{{ Index of topics   Legend ${DIAGRAM} }}} 1. The "Sources" section contains links to external sources, describing the persons (Wikipedia, DNB, VIAF &c). A little bit more code has to be kept in the page to make the little menu work that pops up when the name is being clicked: {{{ Empty Title Index of topics   Legend ${DIAGRAM} }}} 1. Furthermore, a different script has to be executed on these maps. It also requires the persons maps to be exported apart from the others (but maybe it would not hurt to have the css and javascript snippets in the other maps as well) 1. Tiling: Do not activate Tiling 1. To export all the maps, it is best to open all the files, start to export the first one (thereby setting above settings) and close that one. After that the following key sequence can be used: {{{Cmd-E, Return, Return, Cmd-W}}}. This will export and close each map. 1. The resulting html files have to be edited, because a link to another map (i. e. another graphml file) will retain the extension graphml in the source code. The python script [source:/trunk/schema/scripts/Harriot/html2pt.py html2pt] takes care of this replacement and renames the files to the extension {{{*.pt}}}. The script can be called in a loop {{{for i in *.html; do echo $i; python /Users/kthoden/XML-ECHO-SVN/trunk/schema/scripts/Harriot/html2pt.py $i;done}}} (adjust path). 1. The persons maps also require another script, because so far the information about the external links is not included. Right now, it resides in the file [source:/trunk/texts/Harriot/Documents/persons.csv persons.csv] which is then evaluated by the script [source:/trunk/schema/scripts/Harriot/menuMaker.py menuMaker]. 1. The pt-files have to be copied to {{{tuxserve03:18021/echo_nav/echo_pages/content/scientific_revolution/harriot/maps}}} == Export to interactive maps == There is also an experimental representation of the maps as an RDF graph ([http://euler.mpiwg-berlin.mpg.de/LodLive/?http://example.org/harriotOnt/HarriotGraph Enter here]). As of now, it is only accessible inside the institute. The visualization is made with a tool called [http://blog.lodlive.it/ LodLive]), and it can also be queried by going to the [http://euler.mpiwg-berlin.mpg.de/LodLive/ query page]. Instructions how to use the tool are [https://it-dev.mpiwg-berlin.mpg.de/tracs/mpdl-project-content/raw-attachment/wiki/Harriot/using_lodlive.pdf attached to this page]. A thorough documentation will soon be available on [http://intern.mpiwg-berlin.mpg.de/digitalhumanities/it-group-projects/harriot/dokumentation Drupal], you can see [https://it-dev.mpiwg-berlin.mpg.de/tracs/metadataprovider/browser/GraphML2RDF%20(PYTHON) the source code] for the conversion script as well as check it out from the Mercurial repository with the command {{{ hg clone https://it-dev.mpiwg-berlin.mpg.de/hg/graphML2RDF }}} = Index of groups = There is now also an index page, generated directly from the graphml-files. This is an alphabetical list of all the group headings. From inside the Maps directory, call the [source:/trunk/schema/scripts/Harriot/makeIndex.py makeIndex] script (at least Python 2.7 required) and copy the resulting {{{index.pt}}} also to {{{tuxserve03:18021/echo_nav/echo_pages/content/scientific_revolution/harriot/maps}}}. = Linked data = The library's metadata was also used in the [http://dm2e.eu DM2E project] to transform it into RDF triples, for example {{{ a edm:ProvidedCHO ; dm2e:callNumber "HMC 240 III" ; dm2e:levelOfHierarchy 1 ; dm2e:publishedAt ; dc:description "Scanned Document by the DIGIGROUP" ; dc:identifier "MPIWG:1SNN2GVM" ; dc:language "en" ; dc:title "Harriot papers: 254-321 [HMC 240 III]" ; dc:type dm2e:Manuscript ; pro:author ; void:inDataset ; edm:type "TEXT" . }}} Which can also be displayed on the [http://data.dm2e.eu/data/html/resourcemap/mpiwg/harriot/MPIWG:1SNN2GVM/20150113133045860 DM2E datastore] It was then exported to [http://www.europeana.eu/portal/record/2048606/data_item_mpiwg_harriot_MPIWG_1SNN2GVM.html Europeana] Which again links back to the MPIWG view. == Exemplary editing with Pundit == On top of that, the [http://thepund.it/ Pundit annotation system] can be used to annotate the pages, using the system as described on [wiki:pundit elsewhere on this wiki]. The bookmarklets from the page http://it-dev.mpiwg-berlin.mpg.de/pundit2/build/texttoolbm.html have to be installed in the browser. Using a [http://mpdl-system.mpiwg-berlin.mpg.de:30060/mpdl/page-query-result.xql?document=%2Fecho%2Fen%2FReallyClean6782.xml&mode=text&pn=1 version of MS 6782] with all of the commentary removed, a [http://demo-cloud.ask.thepund.it/#/notebooks/802ea36e Pundit notebook] was started, inserting some existing annotations and using the representations of the pages in the datastore and the language technology to link Latin words to MPIWG's dictionaries. These annotations are also RDF triples. In the notebook, clicking on the plus in the bottom right corner gives you additional information on the annotation and also the link to jump to the page. Another visualization of the annotations is this graph in !LodLive: http://demo-lodlive.thepund.it/?http://purl.org/pundit/demo-cloud-server/user/c6daa448