Changes between Initial Version and Version 1 of 2008-09-12_protocol-DESpecs


Ignore:
Timestamp:
Sep 18, 2008, 3:07:50 PM (16 years ago)
Author:
Klaus Thoden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2008-09-12_protocol-DESpecs

    v1 v1  
     1{{{
     2#!html
     3<h2 class="titleHead">DE Specs Working Group Meeting</h2>
     4<div class="author" ><span
     5class="pplr8t-x-x-120">Klaus Thoden</span></div>
     6<br />
     7<div class="date" ><span
     8class="pplr8t-x-x-120">12 September 2008</span></div>
     9   </div>
     10   <h3 class="sectionHead"><span class="titlemark">1    </span> <a
     11 id="x1-10001"></a>Introduction</h3>
     12<!--l. 30--><p class="noindent" >In this meeting, Wolfgang and Klaus presented their list of things that should be considered writing
     13the DE Specs.<span class="footnote-mark"><a
     14href="#fn1x0" id="fn1x0-bk"><sup class="textsuperscript">1</sup></a></span><a
     15 id="x1-1001f1"></a>
     16It was pointed out that the specifications should be fairly general to cover a large set of
     17books.
     18<!--l. 37--><p class="noindent" >
     19   <h3 class="sectionHead"><span class="titlemark">2    </span> <a
     20 id="x1-20002"></a>Things to be marked up</h3>
     21<!--l. 39--><p class="noindent" >Based on examples from ECHO, the following points were discussed. Structural markup
     22means how text is organized on the page. Positional markup means how the text is
     23formatted.
     24   <h4 class="subsectionHead"><span class="titlemark">2.1    </span> <a
     25 id="x1-30002.1"></a>Structural markup</h4>
     26<!--l. 55--><p class="noindent" >
     27   <h5 class="subsubsectionHead"><span class="titlemark">2.1.1    </span> <a
     28 id="x1-40002.1.1"></a>Markup done by the digitizers</h5>
     29<!--l. 56--><p class="noindent" >Not many things will be marked up by the digitizers. This applies mainly to headings,
     30paragraphs, columns and marginal notes. All of these will be marked by beginning and end
     31tags.
     32<!--l. 60--><p class="indent" >   Marginal notes should be written where they occur on the page so that they already
     33roughly anchored to a certain place.
     34                                                                               
     35
     36                                                                               
     37<!--l. 63--><p class="indent" >   When page numbers are found on the page, they will be put as an argument into the
     38header of the page break. Page breaks will be coded as milestones.
     39<!--l. 67--><p class="noindent" >
     40   <h5 class="subsubsectionHead"><span class="titlemark">2.1.2    </span> <a
     41 id="x1-50002.1.2"></a>Things to be ignored</h5>
     42<!--l. 68--><p class="noindent" >Catchwords and signatures at the bottom of the page will be ignored, because they do not
     43carry any useful information.
     44<!--l. 71--><p class="indent" >   Sentences or other semantic units will not be marked up by the digitizers, because it is too
     45difficult.
     46                                                                               
     47
     48                                                                               
     49<!--l. 75--><p class="noindent" >
     50   <h4 class="subsectionHead"><span class="titlemark">2.2    </span> <a
     51 id="x1-60002.2"></a>Positional markup</h4>
     52<!--l. 77--><p class="noindent" >
     53   <h5 class="subsubsectionHead"><span class="titlemark">2.2.1    </span> <a
     54 id="x1-70002.2.1"></a>Ligatures</h5>
     55<!--l. 79--><p class="noindent" >A list of ligatures will be handed to the digitizers which shows them how they should be
     56resolved.
     57   <h5 class="subsubsectionHead"><span class="titlemark">2.2.2    </span> <a
     58 id="x1-80002.2.2"></a>Markup of special characters</h5>
     59<!--l. 82--><p class="noindent" >In order not to have the digitizers type too many tags, special characters could be marked up
     60more easily. Thus, text in italics or small caps could be surrounded by an underscore (_). They
     61have to be used with care, as texts might actually contain these characters (especially books
     62from the 20th century).
     63   <h5 class="subsubsectionHead"><span class="titlemark">2.2.3    </span> <a
     64 id="x1-90002.2.3"></a>Punctuation and spatia and hyphens</h5>
     65<!--l. 89--><p class="noindent" >The spatia in the books are not consistent, be it between words, letters or letters and
     66punctuation. As a rule, the digitizers are told not to write a spatium before a punctuation,
     67even if it is in the text.
     68<!--l. 93--><p class="indent" >   As for spatia inside words, nothing can be done to get the digitizers recognize words.
     69Such errors will have to be emendated by NLP-tools. This applies also to missing
     70hyphens.
     71   <h5 class="subsubsectionHead"><span class="titlemark">2.2.4    </span> <a
     72 id="x1-100002.2.4"></a>Physical damage</h5>
     73<!--l. 98--><p class="noindent" >Text might be rendered unreadable by folds, creases or even holes. In these case, the digitizers
     74are supposed to mark these locations by a special tag.
     75<!--l. 102--><p class="noindent" >
     76   <h3 class="sectionHead"><span class="titlemark">3    </span> <a
     77 id="x1-110003"></a>Things to keep in mind</h3>
     78     <ul class="itemize1">
     79     <li class="itemize">The specifications have to be clear and simple
     80     </li>
     81     <li class="itemize">You cannot code everything!</li></ul>
     82                                                                               
     83
     84                                                                               
     85<!--l. 109--><p class="noindent" >
     86   <h3 class="sectionHead"><span class="titlemark">4    </span> <a
     87 id="x1-120004"></a>Next steps</h3>
     88<!--l. 111--><p class="noindent" >A first draft version will be delivered on Friday, 19th September. Version 1.0 is due September
     8929.
     90<!--l. 114--><p class="indent" >   The authors themselves, as well as willing students, are going to type some text using the
     91DESpecs for evaluation purposes.
     92   <div class="footnotes"><!--l. 33--><p class="indent" >      <span class="footnote-mark"><a
     93href="#fn1x0-bk" id="fn1x0"><sup class="textsuperscript">1</sup></a></span><span
     94class="pplr8t-x-x-90">This wiki-page shows the major issues:</span>
     95<br class="newline" />  <a
     96href="https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/wiki/SampleTexts" class="url" ><span
     97class="pcrr8t-x-x-90">https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/wiki/SampleTexts</span></a> </div>
     98 
     99</body></html>
     100
     101}}}