= MPIWG-MPDL Content Project = This is the wiki for the XML Workflow Service subproject within the [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/attachment/wiki/WikiStart/MPDL_project_desc.pdf cooperative project between the MPIWG and MPDL]. The other subproject is the [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-software Software Development Project]. At present, two working groups are active, "Data Entry Specs WG" and "Document Schema WG". For protocols, check the ProtocolIndex. The protocol of the meeting with the MPDL (Nov 2008) can be found [http://colab.mpdl.mpg.de/mediawiki/Talk:MPDL_Project_XML_Workflow here]. == 1. Data Entry Specs WG == === Existing data entry specs === Some preliminary notes on character issues are to be found on the page CharacterIssues. Some old DE specs may be found under LegacySpecs. Raw data entry versions of Archimedes texts can be accessed through the [http://archimedes.mpiwg-berlin.mpg.de/cvs-web/read/cvswebread.cgi/texts/archimedes/raw/ WebCVS interface]. Malcolm's [wiki:HighLevelRequirements high level requirements] from the large team meeting on 2008-09-11. === Text workflow === Some [wiki:SampleTexts Sample texts] from the ECHO collection. Problems of and requests for ECHO see [wiki:EchoRemarks here]. ["Provisional list"] of books to be transcribed: Batch 1 and 2 from this list have been sent to China. ["First evaluation"] of the work sample. The next batch of books can be found [http://fm8-server.mpiwg-berlin.mpg.de/fmi/iwp/res/iwp_home.html here] (click on "Digitalisierung"). An overview of the books possibly included in the next batch can be found [wiki:"Intermediate Batch 6" here] Some [wiki:FormaxQueries letters] from [http://www.formax.com.cn/ Formax] have been copied to the wiki for further discussion [wiki:OverviewWorkOrders2008 Overview] of the five Work Orders to Formax in 2008. === References on encoding === * [http://www.cs.tut.fi/~jkorpela/chars.html A tutorial on character code issues] (by J. Korpela) * [http://www.unicode.org/ Unicode Home Page] * [http://www.unicode.org/charts/ Code Charts By Script (Unicode 5.1)] * [http://proquest.safaribooksonline.com/9780596102425/dedication Fonts & Encodings] (O'Reilly book by Y. Haralambous, English ed.) * [http://www.w3.org/2003/entities/iso8879doc/overview.html ISO 8879 entities] (from W3C) * [http://www.tei-c.org The Text Encoding Initiative (TEI)] * [http://www.tlg.uci.edu/BetaCode.html Beta Code] === Additional resources === Some material about [wiki:GreekLigatures Greek Ligatures]. Our [wiki:BookRecommendations book recommendations]. On abbreviations: [http://libcat.mpiwg-berlin.mpg.de/bibsys/FMPro?-db=bibliothekskatalog_mpiwg&-lay=cgi&All_fields=abbreviatu&-format=record.html&-max=1&-find= Lexicon abbreviaturarum] by Adriano Cappelli === Completed specs === See [wiki:DataEntrySpecs here]. === Results from China === See [http://pythia.mpiwg-berlin.mpg.de/department1/mpdl/raw-texts here]. == 2. Document Schema WG == Malcolm's [wiki:SchemaHighLevelRequirements Schema high level requirements] from the large team meeting on 2008-09-18. Two documents that will serve as starting points for the Document Schema can be found [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/attachment/wiki/WikiStart/echo_V1.xml here] and [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/attachment/wiki/WikiStart/ECHO00001A2B3CX_V2.xml here]. === References === * Relax NG * [http://proquest.safaribooksonline.com/0596004214/relax-PREFACE-2 Relax NG] (O'Reilly book by E. van der Vlist) * [http://books.xmlschemata.org/relaxng/ The GFDL release] of this book, along with updates (html) * [http://www.thaiopensource.com/relaxng/trang.html trang] (open source schema converter written in Java) * [http://relaxng.org/compact-tutorial-20030326.html RELAX NG Compact Syntax Tutorial] * a [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/attachment/wiki/WikiStart/schema.tar.gz zipped tarball] with schemas for some well-known document types * [http://enlil.museum.upenn.edu/cdl/doc/XDF XDF] : XML Documentation Format (literate programming with Relax NG) * [http://dublincore.org/ Dublin Core (Metadata)] * [http://www.loc.gov/standards/iso639-2/ ISO 639-2] (codes for natural languages) * [http://www.w3.org/TR/NOTE-datetime ISO 8601] (date and time formats; brief reference from W3C) === Tools === * [http://xmlstar.sourceforge.net/ XMLStarlet] Command Line XML Toolkit * [http://www.xmlsoft.org/ libxml2] contains ''xmllint'', a command line tool for validating * [http://www.id.cbs.dk/~dh/corpus/tools/MXTERMINATOR.html MXTerminator] A tool for sentence boundary detection == Documentation about trac == * WikiFormatting -- detailed description of available Wiki formatting commands * TracGuide -- Built-in Documentation * [http://trac.edgewall.org/ The Trac project] -- Trac Open Source Project * [http://trac.edgewall.org/wiki/TracFaq Trac FAQ] -- Frequently Asked Questions * TracSupport -- Trac Support For a complete list of local wiki pages, see TitleIndex.