Changes between Version 12 and Version 13 of WO3_Euclid_1966


Ignore:
Timestamp:
Feb 26, 2009, 9:19:28 AM (15 years ago)
Author:
Wolfgang Schmidle
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WO3_Euclid_1966

    v12 v13  
    5757=== Emendations of the raw text ===
    5858
     59normalize the zero in modern page numbers
    5960{{{
    60 # normalize the zero in modern page numbers
    6161s!○!〇!g;  # white circle U+25CB --> ideographic number zero U+3007
     62}}}
    6263
    63 # ignore outdentation in the preface
     64ignore outdentation in the preface
     65{{{
    6466if ($line < 153) { s!<p x>!<p>!; }
     67}}}
    6568
    66 # pre-process notes that continue on the next line     
     69pre-process notes that continue on the next line       
     70{{{
    6771...
     72}}}
    6873
    69 # fill in the unknown characters (MSi)
     74fill in the unknown characters (MSi)
     75{{{
    7076s!<001>!轂!g;  # s!<001>!<unknown code="001" unicode="8F42">轂</unknown>!g;
    7177s!<002>!<unknown code="002" unicode="2F88D">庶</unknown>!g; # the actual Unicode character 庶 breaks oXygen
     78}}}
    7279
    73 # clarify <?> (the list is not complete!)
     80clarify <?> (the list is not complete!)
     81{{{
    7482s!<?>!<?>!; # line 811: fullwidth question mark U+FF1F --> ASCII question mark U+003F
    7583s!愈<\?>!愈!g; # MSi: the reading is correct
    7684s!丙、等。<?>而戊丙丁、與甲乙丙、又等。!丙、等。而戊丙丁、與甲乙丙、又等。!; # line 1041
    7785# (line 1041: MSi: It is in the middle of a sentence, but a period at this position is quite common nonetheless.)
     86}}}
    7887
    79 # missing line breaks (the list is not complete!)
     88missing line breaks (the list is not complete!)
     89{{{
    8090s!小於兩直角。則此二橫直線。!小於兩直角。則此二橫<lb/>直線。!; # line 403; may have to do with the neighboring figure
    8191s!俱小於直角。或幷之小於兩直角。!俱小於直角。或幷之小<lb/>於兩直角。!; # line 404
     92}}}
    8293
    83 # normalize the hash in the table
     94normalize the hash in the table
     95{{{
    8496s!#!#!g; # fullwidth number sign U+FF03 --> ASCII hash, i.e. number sign U+0023
     97}}}
    8598
    86 # move the only table in the text (ECHO p.327) out of its surrounding sentence
     99move the only table in the text (ECHO p.327) out of its surrounding sentence
     100{{{
    87101s!却云十六與十二之比例。若!却云十六與十二之比例。!;  # line 4562
    88102s!八與三、及二與四之比例。!若<lb/>八與三、及二與四之比例。!; #line 4573
     103}}}
    89104
    90 # misc. emendations
     105misc. emendations
     106{{{
    91107s!N12<114608657010!N12x114608657010!; # line 5: replace "<" in library stamp junk
    92108s!<pb 六><h>幾何原本 卷一之首</h>!<pb 六><rh>幾何原本 卷一之首</rh>!;  # line 245 (obvious mistake)
     
    98114=== Further processing steps ===
    99115
    100 {{{
    101 # metadata
     116  * metadata
     117  * unknown characters
     118  * figures
     119  * the table on ECHO p.327
     120  * ad hoc tagging of book covers, preface, chapters, chapter heads, chapter mains, backmatter
     121  * page breaks
     122          * The ECHO pages 215 to 220 reduplicate the pages 209 to 214 and have been typed only once
     123  * headings
     124          * notes in headings
     125          * headings at the lowest level
     126  * paragraphs
     127          * normalize the periods: there should always be a period before </p> and </sm>
     128          (if the period is missing, insert an ASCII period)
     129          * turn small text into notes
     130          * tag sentences
     131          * end each line with <lb/>
     132          * outdented paragraphs
    102133
    103 # unknown characters
    104134
    105 # figures
     135=== Some remaining issues ===
    106136
    107 # the table on ECHO p.327
    108 
    109 # ad hoc tagging of book covers, preface, chapters, chapter heads, chapter mains, backmatter
    110 
    111 # page breaks
    112         # The ECHO pages 215 to 220 reduplicate the pages 209 to 214 and have been typed only once
    113 
    114 # headings
    115         # notes in headings
    116         # headings at the lowest level
    117 
    118 # paragraphs
    119 
    120         # normalize the periods: there should always be a period before </p> and </sm>
    121           (if the period is missing, insert an ASCII period)
    122         # turn small text into notes
    123         # tag sentences
    124         # end each line with <lb/>
    125         # outdented paragraphs
    126 }}}
     137  * The metadata may be incorrect in details.
     138  * Will the <var> in figures collide with the normal <var>?
     139  * Replacing <001> by the correct character works fine, but <002> is in a higher plane of Unicode and kills off oXygen, so I have used the simpler standard version of this character.
     140  * Some <?> have been post-processed already, and I have removed the respective <?> tags because a <?> tag has no value in itself once the line has been checked. I have compiled a list of lines which contain <?> and/or @ and haven't been post-processed yet. ("。</s><s><unsure/>" means that the period is unclear. And one artifact sentence "。</s><s><unsure/>.</s>")
     141  * The <desc> and <var> in figures have not been used very consistently. (They did not use <cap> at all, but I only know of one figure where it would make sense to use it.)
     142  * Figures and note have no place attribute.
     143  * No <num>, <var> (outside of figures), <ptr>, no IDs yet.
     144  * The parts of problems with more than one part have not been encoded yet.
     145  * Four books, i.e. four titles, one in the front (attribute n=1) and three in the body (n=1, 2, 3).