= MPIWG-MPDL Content Project = This is the wiki for the XML Workflow Service subproject within the [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/attachment/wiki/WikiStart/MPDL_project_desc.pdf cooperative project between the MPIWG and MPDL]. The other subproject is the [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-software Software Development Project]. At present, two working groups are active, "Data Entry Specs WG" and "Document Schema WG". [http://listserver.mpiwg-berlin.mpg.de/mailman/listinfo/mpiwg-mpdl Link] to the project mailing list. For meeting protocols, check the ProtocolIndex. The protocol of the meeting with the MPDL (Nov 2008) can be found [http://colab.mpdl.mpg.de/mediawiki/Talk:MPDL_Project_XML_Workflow here]. The Data Entry Specs can be found [http://pythia.mpiwg-berlin.mpg.de/department1/mpdl/despecs here] (some old versions are [wiki:DataEntrySpecs here]). An [wiki:OverviewWorkOrders overview] of the results so far. A list of [wiki:policyDocuments policy documents]. Pictures of the data entry can be found [wiki:DE_Pictures here]. Bugfiling and feature requesting for ECHO can be done [http://pythia.mpiwg-berlin.mpg.de/itgroup/bugReport/ here]. [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/Arboreal Arboreal]'s source code and wiki Workflows and integration with eSciDoc [wiki:workflows here]. Missing texts and pictures in [wiki:Archimedes_Problems Archimedes] == 1. Data Entry Specs WG == === Existing data entry specs === Some preliminary notes on character issues are to be found on the page CharacterIssues. Some old DE specs may be found under LegacySpecs. Raw data entry versions of Archimedes texts can be accessed through the [http://archimedes.mpiwg-berlin.mpg.de/cvs-web/read/cvswebread.cgi/texts/archimedes/raw/ WebCVS interface]. Malcolm's [wiki:HighLevelRequirements high level requirements] from the large team meeting on 2008-09-11. === Text workflow === Our standards for digital images from external vendors can be found [wiki:"Image standards" here]. DFG Practical Guidelines on Digitisation: [http://www.dfg.de/forschungsfoerderung/wissenschaftliche_infrastruktur/lis/download/praxisregeln_digitalisierung.pdf German], [http://www.dfg.de/forschungsfoerderung/wissenschaftliche_infrastruktur/lis/download/praxisregeln_digitalisierung_en.pdf English]. Some [wiki:SampleTexts Sample texts] from the ECHO collection. Problems of and requests for ECHO see [wiki:EchoRemarks here]. ["Provisional list"] of books to be transcribed: Batch 1 and 2 from this list have been sent to China. ["First evaluation"] of the work sample. The next batch of books can be found [http://fm8-server.mpiwg-berlin.mpg.de/fmi/iwp/res/iwp_home.html here] (click on "Digitalisierung"). An overview of the books possibly included in the next batch can be found [wiki:"Intermediate Batch 6" here] Some [wiki:FormaxQueries letters] from [http://www.formax.com.cn/ Formax] have been copied to the wiki for further discussion [wiki:OverviewWorkOrders2008 Overview] of the five Work Orders to Formax in 2008. [wiki:"Regex from Alvarus" Collection] of Regular Expressions for replacing abbreviations. === References on encoding === * [http://www.cs.tut.fi/~jkorpela/chars.html A tutorial on character code issues] (by J. Korpela) * [http://www.unicode.org/ Unicode Home Page] * [http://www.unicode.org/charts/ Code Charts By Script (Unicode 5.1)] * [http://www.unicode.org/reports/tr15/ Unicode Standard Annex #15: UNICODE NORMALIZATION FORMS] * [http://proquest.safaribooksonline.com/9780596102425/dedication Fonts & Encodings] (O'Reilly book by Y. Haralambous, English ed.) * [http://search.cpan.org/~sadahiro/Unicode-Normalize-1.03/Normalize.pm Documentation for Unicode::Normalize Perl module] (from [http://www.cpan.org/ CPAN]) * [http://www.w3.org/2003/entities/iso8879doc/overview.html ISO 8879 entities] (from W3C) * [http://www.tei-c.org The Text Encoding Initiative (TEI)] * [http://www.tlg.uci.edu/BetaCode.html Beta Code] * [http://www.geonames.de/codlang.html#script ISO 15924 script codes] === Additional resources === Some material about [wiki:GreekLigatures Greek Ligatures]. Our [wiki:BookRecommendations book recommendations]. On abbreviations: [http://libcat.mpiwg-berlin.mpg.de/bibsys/FMPro?-db=bibliothekskatalog_mpiwg&-lay=cgi&All_fields=abbreviatu&-format=record.html&-max=1&-find= Lexicon abbreviaturarum] by Adriano Cappelli === Completed specs === See [wiki:DataEntrySpecs here]. === Results from China === See [http://pythia.mpiwg-berlin.mpg.de/department1/mpdl/raw-texts here]. == 2. Document Schema WG == Malcolm's [wiki:SchemaHighLevelRequirements Schema high level requirements] from the large team meeting on 2008-09-18. Two documents that will serve as starting points for the Document Schema can be found [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/attachment/wiki/WikiStart/echo_V1.xml here] and [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/attachment/wiki/WikiStart/ECHO00001A2B3CX_V2.xml here]. Discussion of the [wiki:Metadata]. === References === * Relax NG * [http://proquest.safaribooksonline.com/0596004214/relax-PREFACE-2 Relax NG] (O'Reilly book by E. van der Vlist) * [http://books.xmlschemata.org/relaxng/ The GFDL release] of this book, along with updates (html) * [http://www.thaiopensource.com/relaxng/trang.html trang] (open source schema converter written in Java) * [http://relaxng.org/compact-tutorial-20030326.html RELAX NG Compact Syntax Tutorial] * a [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/attachment/wiki/WikiStart/schema.tar.gz zipped tarball] with schemas for some well-known document types * [http://enlil.museum.upenn.edu/cdl/doc/XDF XDF] : XML Documentation Format (literate programming with Relax NG) * [http://dublincore.org/ Dublin Core (Metadata)] * [http://www.loc.gov/standards/iso639-2/ ISO 639-2] (codes for natural languages) * [http://www.w3.org/TR/NOTE-datetime ISO 8601] (date and time formats; brief reference from W3C) === Tools === * [http://xmlstar.sourceforge.net/ XMLStarlet] Command Line XML Toolkit * [http://www.xmlsoft.org/ libxml2] contains ''xmllint'', a command line tool for validating * [http://www.id.cbs.dk/~dh/corpus/tools/MXTERMINATOR.html MXTerminator] A tool for sentence boundary detection == Documentation about trac == * WikiFormatting -- detailed description of available Wiki formatting commands * TracGuide -- Built-in Documentation * [http://trac.edgewall.org/ The Trac project] -- Trac Open Source Project * [http://trac.edgewall.org/wiki/TracFaq Trac FAQ] -- Frequently Asked Questions * TracSupport -- Trac Support For a complete list of local wiki pages, see TitleIndex.