Version 107 (modified by 14 years ago) (diff) | ,
---|
MPIWG-MPDL Content Project
This is the wiki for the XML Workflow Service subproject within the cooperative project between the MPIWG and MPDL. The other subproject is the Software Development Project.
Main achievements of this subproject are
- The Data Entry Specifications, i.e. the rules for transcribing European and Chinese texts
- The Schema for the XML versions of the transcriptions
- The Workflow for transforming raw transcriptions into valid XML texts
- Nearly 80 XML texts
Link to the project mailing list. For meeting protocols, check the ProtocolIndex. The protocol of the meeting with the MPDL (Nov 2008) can be found here.
The Data Entry Specs can be found here (some old versions are here).
An overview of the results so far.
A list of policy documents.
Pictures of the data entry can be found here.
Bugfiling and feature requesting for ECHO can be done here.
Arboreal's source code and wiki
Workflows and integration with eSciDoc here.
Missing texts and pictures in Archimedes
The workflow for updating ECHO xml-documents (especially: aligning the xml to new images)
Information about attended Workshops
Overview pages (under construction)
1. Data Entry Specs WG
Existing data entry specs
Some preliminary notes on character issues are to be found on the page CharacterIssues. Some old DE specs may be found under LegacySpecs. Raw data entry versions of Archimedes texts can be accessed through the WebCVS interface.
Malcolm's high level requirements from the large team meeting on 2008-09-11.
Text workflow
Our standards for digital images from external vendors can be found here. DFG Practical Guidelines on Digitisation: German, English.
Some Sample texts from the ECHO collection. Problems of and requests for ECHO see here.
Provisional list of books to be transcribed: Batch 1 and 2 from this list have been sent to China. First evaluation of the work sample.
The next batch of books can be found here (click on "Digitalisierung"). An overview of the books possibly included in the next batch can be found here
Some letters from Formax have been copied to the wiki for further discussion
Overview of the five Work Orders to Formax in 2008.
Collection of Regular Expressions for replacing abbreviations.
References on encoding
- A tutorial on character code issues (by J. Korpela)
- Unicode Home Page
- Fonts & Encodings (O'Reilly book by Y. Haralambous, English ed.)
- Documentation for Unicode::Normalize Perl module (from CPAN)
- ISO 8879 entities (from W3C)
- The Text Encoding Initiative (TEI)
- Beta Code
- ISO 15924 script codes
Additional resources
Some material about Greek Ligatures. Our book recommendations?.
On abbreviations: Lexicon abbreviaturarum by Adriano Cappelli
Completed specs
See here. There is also a list which shows what images were used in the examples.
Results from China
See here.
2. Document Schema WG
Malcolm's Schema high level requirements from the large team meeting on 2008-09-18.
Two documents that will serve as starting points for the Document Schema can be found here and here.
Discussion of the Metadata.
References
- Relax NG
- Relax NG (O'Reilly book by E. van der Vlist)
- The GFDL release of this book, along with updates (html)
- trang (open source schema converter written in Java)
- RELAX NG Compact Syntax Tutorial
- a zipped tarball with schemas for some well-known document types
- XDF : XML Documentation Format (literate programming with Relax NG)
- Relax NG (O'Reilly book by E. van der Vlist)
- Dublin Core (Metadata)
- ISO 639-2 (codes for natural languages)
- ISO 8601 (date and time formats; brief reference from W3C)
Tools
- XMLStarlet Command Line XML Toolkit
- libxml2 contains xmllint, a command line tool for validating
- MXTerminator A tool for sentence boundary detection
Documentation about trac
- WikiFormatting -- detailed description of available Wiki formatting commands
- TracGuide -- Built-in Documentation
- The Trac project -- Trac Open Source Project
- Trac FAQ -- Frequently Asked Questions
- TracSupport -- Trac Support
For a complete list of local wiki pages, see TitleIndex.
Attachments (7)
-
MPDL_project_desc.pdf (210.8 KB) - added by 16 years ago.
MPIWG proposal within the MPDL framework
-
ECHO-DE-draft.oo3.zip (5.0 KB) - added by 16 years ago.
DE draft guidelines (OmniOutliner file, zipped)
-
transcr.pdf (129.8 KB) - added by 16 years ago.
Old Archimedes Project transcription workflow (B. Fuchs)
-
archimedes.pen (7.6 KB) - added by 16 years ago.
special entities used in Archimedes documents
- echo_V1.xml (6.0 KB) - added by 16 years ago.
- ECHO00001A2B3CX_V2.xml (387.1 KB) - added by 16 years ago.
-
schema.tar.gz (53.1 KB) - added by 16 years ago.
sample Relax NG schemas for well-known document types
Download all attachments as: .zip