Version 18 (modified by 14 years ago) (diff) | ,
---|
Schema support
The MPDL document storing and querying system supports the document schemas Archimedes and Echo. In the near future also a subset of TEI(-Lite) and for URI's a subset of XPointer will be supported.
The element "pb" (page break) and "s" (sentence) are supported especially: a special fast method is used to retrieve a certain page (the fragment between two "pb" elements) in a document. And all fulltext queries within documents are executed against the element "s" and result in hits which each contains the position in the document (page number and sentence number).
Archimedes
Schema
The Archimedes schema is developed by the Archimedes project and could be found here.
Example
In the following simple example the metadata part consists of 4 elements („author“, „title“, „lang“, „date“) and the text part consists of 2 pages („pb“) with 2 paragraphs („p“) which contains 3 sentences („s“).
<?xml version="1.0" encoding="UTF-8"?> <archimedes xmlns:xlink="http://www.w3.org/1999/xlink"> <info> <author>Name, Prename</author> <title>Title</title> <lang>en</lang> <date>1789</date> </info> <text> <body> <pb xlink:href="0001.jpg"/> <p> <s>This is the first sentence of the first paragraph.</s> <s>This is the second sentence of the first paragraph.</s> <s>This is the third sentence of the first paragraph.</s> </p> <p> <s>This is the first sentence of the second paragraph.</s> <s>This is the second sentence of the second paragraph.</s> <s>This is the third sentence of the second paragraph.</s> </p> <pb xlink:href="0002.jpg"/> <p> <s>This is the first sentence of the first paragraph with line <lb/>break.</s><lb/> <s>This is the second sentence of the first paragraph with line <lb/>break.</s><lb/> <s>This is the third sentence of the first paragraph with line <lb/>break.</s><lb/> </p> <p> <s>This is the first sentence of the second paragraph.</s> <s>This is the second sentence of the second paragraph.</s> <s>This is the third sentence of the second paragraph.</s> </p> </body> </text> </archimedes>
Echo
Schema
The MPDL Echo schema is developed by the schema group of this project and could be found here.
Elements
An Echo document (element „echo“ with namespace „echo“) consists of a metadata part (element „metadata“) which contains the Dublin Core metadata of the document and a fulltext part (element „text“) which contains the content of the document.
Dublin Core metadata elements (namespace dcterms):
- identifier
- creator
- title
- date
- rights
- license
- accessRights
Fulltext elements (with attributes in paranthesis):
- text elements: head, div (type, level, style), p (style), pb (file), lb, cb, gap (extent), s
- figure elements: figure, image (file), caption (style), description (style), variables (style), handwritten (xlink:href)
- note elements: note (xlink:label)
- quotation elements: q, quote, blockquote, set-off
- translation elements: foreign (lang, xml:lang), reg (orig)
- mathematical elements: var (type), num, mml:*
- geographical elements: place, event, time
- person elements: person
- xhtml elements: xhtml:* : e.g. table, ul
- other elements: expan, emph (class), ref (target), anchor (type, xlink:label, xlink:href)
Example
In the following simple example the metadata part consists of 4 Dublin Core elements („creator“, „title“, „language“, „date“) and the text part consists of 2 pages („pb“) with 2 paragraphs („p“) which contains 3 sentences („s“).
<?xml version="1.0" encoding="UTF-8"?> <echo xmlns="http://www.mpiwg-berlin.mpg.de/ns/echo/1.0/" xmlns:dcterms="http://purl.org/dc/terms"> <metadata> <dcterms:creator>Name, Prename</dcterms:creator> <dcterms:title>Title</dcterms:title> <dcterms:language>en</dcterms:language> <dcterms:date>1789</dcterms:date> </metadata> <text> <pb file="0001"/> <p> <s>This is the first sentence of the first paragraph.</s> <s>This is the second sentence of the first paragraph.</s> <s>This is the third sentence of the first paragraph.</s> </p> <p> <s>This is the first sentence of the second paragraph.</s> <s>This is the second sentence of the second paragraph.</s> <s>This is the third sentence of the second paragraph.</s> </p> <pb file="0002"/> <p> <s>This is the first sentence of the first paragraph with line <lb/>break.</s><lb/> <s>This is the second sentence of the first paragraph with line <lb/>break.</s><lb/> <s>This is the third sentence of the first paragraph with line <lb/>break.</s><lb/> </p> <p> <s>This is the first sentence of the second paragraph.</s> <s>This is the second sentence of the second paragraph.</s> <s>This is the third sentence of the second paragraph.</s> </p> </text> </echo>
TEI
A subset of TEI(-Lite) will be supported in the near future.
Schema
TEI provides customizations for different purposes. The MPDL project supports a subset of TEI Lite which is widely used and includes basic elements for simple documents. Also some useful elements (which are part of the TEI schema with all modules) and MathML and SVG (which are part of the TEI schema with all modules plus external additions) are supported.
A description of TEI pointers could be found here.
Elements
TEI Lite (with attributes in paranthesis):
- metadata elements: fileDesc, titleStmt, author, name, title, publicationStmt, date, idno, availability, sourceDesc, bibl, profileDesc, langUsage, language
- text elements: div, gap (extent), head, lb, lg, l, p, pb, s, seg
- link elements: ref (target), ptr (target), anchor (xml:id)
- figure elements: figure
- note elements: note (place)
- quotation elements: q, quote
- translation elements: foreign (xml:lang), reg
- terminology elements: term
- name elements: name (type="place"), (type="person"), name (type="organization")
- time elements: date
- mathematical elements: num
- other elements: hi (rend), expan, emph, choice, orig, abbr, sic, corr
TEI
- text elements: pb (facs)
- figure elements: figure (facs)
- name elements: placeName (type), persName
- other elements: ex, am
Additional
- MathML elements: m:*
- SVG elements: svg:*
- XHTML elements (e.g. table and list elements): xhtml:* (planned in future)
Example
In the following simple example the metadata part (teiHeader) consists of 4 elements („author“, „title“, „language“, „date“) and the text part consists of 2 pages („pb“) with 2 paragraphs („p“) which contains 3 sentences („s“) and a formula in MathML. The second line (beginning with "<?oxygen RNGSchema=") could be omitted when no additional TEI elements (e.g. MathML) are used.
<?xml version="1.0" encoding="UTF-8"?> <?oxygen RNGSchema="http://mpdl-proto.mpiwg-berlin.mpg.de/exist/rest/db/mpdl/schema/tei/tei_allPlus.rnc" type="compact"?> <TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:m="http://www.w3.org/1998/Math/MathML"> <teiHeader> <fileDesc> <titleStmt> <title>Title</title> <author>Name, Prename</author> </titleStmt> <publicationStmt> <date>1789</date> <idno>/experimental/yourDirectory</idno> <availability status="free"> <p>This text is available under Creative Commons license CC-BY</p> </availability> </publicationStmt> <sourceDesc> <bibl> </bibl> </sourceDesc> </fileDesc> <profileDesc> <langUsage> <language ident="en">English</language> </langUsage> </profileDesc> </teiHeader> <text> <body> <pb facs="0001.jpg"/> <p> <s>This is the first sentence of the first paragraph. And <ref target="http://slime.de">here</ref> is a link.</s> <s>This is the second sentence of the first paragraph.</s> <s>This is the third sentence of the first paragraph.</s> </p> <p> <s>This is the first sentence of the second paragraph.</s> <s>This is the second sentence of the second paragraph.</s> <s>This is the third sentence of the second paragraph.</s> </p> <pb facs="0002.jpg"/> <p> <s>This is the first sentence of the first paragraph with line <lb/>break.</s><lb/> <s>This is the second sentence of the first paragraph with line <lb/>break.</s><lb/> <s>This is the third sentence of the first paragraph with line <lb/>break.</s><lb/> </p> <p> <s>This is the first sentence of the second paragraph.</s> <s>This is the second sentence of the second paragraph.</s> <s>This is the third sentence of the second paragraph.</s> </p> <figure facs="fig-0001.jpg"></figure> <p>And this is a formula in MathML: <formula> <m:math> <m:mrow> <m:msup> <m:mi>x</m:mi> <m:mn>2</m:mn> </m:msup> </m:mrow> </m:math> </formula> </p> </body> </text> </TEI>