| 1 | {{{ |
| 2 | #!html |
| 3 | |
| 4 | <h1>Language Specification in Arboreal</h1> |
| 5 | |
| 6 | <p>Arboreal's language architecture (cf. <a |
| 7 | href="textarch.png">schematic</a>) requires that the language of text |
| 8 | be somehow specified. Here are the rules Arboreal uses: |
| 9 | |
| 10 | <ol> |
| 11 | <li>Language may be specified in the document metadata. Arboreal |
| 12 | determines the language by using the XPath query under the |
| 13 | <b><locator></b> tag in the <b><metadata></b> definition |
| 14 | in the docspec file. For Archimedes texts, the language is specified |
| 15 | in the <b><lang></b> section under <b><info>.</b> E.g.: |
| 16 | |
| 17 | |
| 18 | <p><pre> |
| 19 | <b><info></b> |
| 20 | ... |
| 21 | <b><lang></b>it<b></lang></b> |
| 22 | ... |
| 23 | <b></info></b> |
| 24 | </pre></p> |
| 25 | |
| 26 | <li>Any element (tag) may have a <b>lang</b> attribute. The value set |
| 27 | here applies to the entire subtree for which the element is the root |
| 28 | (unless the setting is overridden by a <b>lang</b> attribute of some |
| 29 | descendant node or nodes). This language setting overrides the |
| 30 | language (if any) that is specified in the document metadata. Note |
| 31 | that the language may be set for the entire document simply by |
| 32 | supplying a <b>lang</b> attribute for the root element. E.g.: |
| 33 | |
| 34 | |
| 35 | <p><pre> |
| 36 | <b><root lang="la"></b> |
| 37 | ... |
| 38 | </pre></p> |
| 39 | |
| 40 | <li>The text under certain elements is considered as a single unit, |
| 41 | called an <b>amalgamation</b>. Nodes to which this behavior applies |
| 42 | are called <b>container</b> nodes. The nodes considered containers are |
| 43 | enumerated under <b><containers></b> in the docspec file. (Also: |
| 44 | any node that is the root of a subtree containing only text nodes is |
| 45 | automatically considered a container node.) In the Archimedes <acronym |
| 46 | title="Document Type Definition">DTD</acronym>, <b><s></b> is a |
| 47 | container. The amalgamation belonging to a container may consist of |
| 48 | text in only a <i>single</i> language. In the case of multilingual |
| 49 | documents, however, it will sometimes be necessary for a container |
| 50 | (e.g., a sentence) to contain text in more than one language. To allow |
| 51 | for this possibility, elements may defined as <b>subcontainers</b> in |
| 52 | the docspec file. Text that belongs to a subcontainer is treated as |
| 53 | the amalgamation of the subcontainer, not of the (parent) container. |
| 54 | In the Archimedes doctype, <b><foreign></b> is defined as a |
| 55 | subcontainer. Thus we can have something like: |
| 56 | |
| 57 | |
| 58 | <p><tt><b><s id="Academica2.18.3" |
| 59 | lang="la"></b>Cum enim ita negaret quidquam esse quod |
| 60 | comprehendi posset (id enim volumus esse <b><foreign |
| 61 | lang="el"></b>a)kata/lhpton<b></foreign></b>), si |
| 62 | illud esset, sicut Zeno definiret, tale visum (iam enim hoc pro |
| 63 | <b><foreign |
| 64 | lang="el"></b>fantasi/a|<b></foreign></b> verbum |
| 65 | satis hesterno sermone trivimus), visum igitur impressum effictumque |
| 66 | ex eo unde esset quale esse non posset ex eo unde non |
| 67 | esset...<b></s></b></tt></p> |
| 68 | |
| 69 | <li>If no language is specified anywhere in the document, the document |
| 70 | is considered to be in the default language. This default may be set |
| 71 | in the <a href="https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/Arboreal/wiki/Configuration">preferences dialog</a> |
| 72 | |
| 73 | </ol> |
| 74 | |
| 75 | <hr> |
| 76 | |
| 77 | <p>The code used for the language is always the two- or three-letter |
| 78 | code specified in <a |
| 79 | href="http://lcweb.loc.gov/standards/iso639-2/langcodes.html"><acronym |
| 80 | title="International Standards Organization">ISO</acronym> 639</a>. |
| 81 | These codes are <i>not</i> case-sensitive. The codes for languages |
| 82 | we're currently using are:</p> |
| 83 | |
| 84 | <p><blockquote> |
| 85 | <table border="yes"> |
| 86 | <tr><td><code>ar</code></td><td>Arabic</td></tr> |
| 87 | <tr><td><code>de</code></td><td>German</td></tr> |
| 88 | <tr><td><code>en</code></td><td>English</td></tr> |
| 89 | <tr><td><code>el</code></td><td>Greek</td></tr> |
| 90 | <tr><td><code>fr</code></td><td>French</td></tr> |
| 91 | |
| 92 | <tr><td><code>it</code></td><td>Italian</td></tr> |
| 93 | <tr><td><code>la</code></td><td>Latin</td></tr> |
| 94 | <tr><td><code>zh</code></td><td>Chinese</td></tr> |
| 95 | </table> |
| 96 | </blockquote></p> |
| 97 | |
| 98 | <p>For an sample document that illustrates language embedding, see <a href="https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/Arboreal/attachment/wiki/Scrapbook/testbed.xml">testbed.xml</a> |
| 99 | |
| 100 | }}} |