Changes between Version 12 and Version 13 of WO3_Euclid_1966
- Timestamp:
- Feb 26, 2009, 9:19:28 AM (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WO3_Euclid_1966
v12 v13 57 57 === Emendations of the raw text === 58 58 59 normalize the zero in modern page numbers 59 60 {{{ 60 # normalize the zero in modern page numbers61 61 s!○!〇!g; # white circle U+25CB --> ideographic number zero U+3007 62 }}} 62 63 63 # ignore outdentation in the preface 64 ignore outdentation in the preface 65 {{{ 64 66 if ($line < 153) { s!<p x>!<p>!; } 67 }}} 65 68 66 # pre-process notes that continue on the next line 69 pre-process notes that continue on the next line 70 {{{ 67 71 ... 72 }}} 68 73 69 # fill in the unknown characters (MSi) 74 fill in the unknown characters (MSi) 75 {{{ 70 76 s!<001>!轂!g; # s!<001>!<unknown code="001" unicode="8F42">轂</unknown>!g; 71 77 s!<002>!<unknown code="002" unicode="2F88D">庶</unknown>!g; # the actual Unicode character 庶 breaks oXygen 78 }}} 72 79 73 # clarify <?> (the list is not complete!) 80 clarify <?> (the list is not complete!) 81 {{{ 74 82 s!<?>!<?>!; # line 811: fullwidth question mark U+FF1F --> ASCII question mark U+003F 75 83 s!愈<\?>!愈!g; # MSi: the reading is correct 76 84 s!丙、等。<?>而戊丙丁、與甲乙丙、又等。!丙、等。而戊丙丁、與甲乙丙、又等。!; # line 1041 77 85 # (line 1041: MSi: It is in the middle of a sentence, but a period at this position is quite common nonetheless.) 86 }}} 78 87 79 # missing line breaks (the list is not complete!) 88 missing line breaks (the list is not complete!) 89 {{{ 80 90 s!小於兩直角。則此二橫直線。!小於兩直角。則此二橫<lb/>直線。!; # line 403; may have to do with the neighboring figure 81 91 s!俱小於直角。或幷之小於兩直角。!俱小於直角。或幷之小<lb/>於兩直角。!; # line 404 92 }}} 82 93 83 # normalize the hash in the table 94 normalize the hash in the table 95 {{{ 84 96 s!#!#!g; # fullwidth number sign U+FF03 --> ASCII hash, i.e. number sign U+0023 97 }}} 85 98 86 # move the only table in the text (ECHO p.327) out of its surrounding sentence 99 move the only table in the text (ECHO p.327) out of its surrounding sentence 100 {{{ 87 101 s!却云十六與十二之比例。若!却云十六與十二之比例。!; # line 4562 88 102 s!八與三、及二與四之比例。!若<lb/>八與三、及二與四之比例。!; #line 4573 103 }}} 89 104 90 # misc. emendations 105 misc. emendations 106 {{{ 91 107 s!N12<114608657010!N12x114608657010!; # line 5: replace "<" in library stamp junk 92 108 s!<pb 六><h>幾何原本 卷一之首</h>!<pb 六><rh>幾何原本 卷一之首</rh>!; # line 245 (obvious mistake) … … 98 114 === Further processing steps === 99 115 100 {{{ 101 # metadata 116 * metadata 117 * unknown characters 118 * figures 119 * the table on ECHO p.327 120 * ad hoc tagging of book covers, preface, chapters, chapter heads, chapter mains, backmatter 121 * page breaks 122 * The ECHO pages 215 to 220 reduplicate the pages 209 to 214 and have been typed only once 123 * headings 124 * notes in headings 125 * headings at the lowest level 126 * paragraphs 127 * normalize the periods: there should always be a period before </p> and </sm> 128 (if the period is missing, insert an ASCII period) 129 * turn small text into notes 130 * tag sentences 131 * end each line with <lb/> 132 * outdented paragraphs 102 133 103 # unknown characters104 134 105 # figures 135 === Some remaining issues === 106 136 107 # the table on ECHO p.327 108 109 # ad hoc tagging of book covers, preface, chapters, chapter heads, chapter mains, backmatter 110 111 # page breaks 112 # The ECHO pages 215 to 220 reduplicate the pages 209 to 214 and have been typed only once 113 114 # headings 115 # notes in headings 116 # headings at the lowest level 117 118 # paragraphs 119 120 # normalize the periods: there should always be a period before </p> and </sm> 121 (if the period is missing, insert an ASCII period) 122 # turn small text into notes 123 # tag sentences 124 # end each line with <lb/> 125 # outdented paragraphs 126 }}} 137 * The metadata may be incorrect in details. 138 * Will the <var> in figures collide with the normal <var>? 139 * Replacing <001> by the correct character works fine, but <002> is in a higher plane of Unicode and kills off oXygen, so I have used the simpler standard version of this character. 140 * Some <?> have been post-processed already, and I have removed the respective <?> tags because a <?> tag has no value in itself once the line has been checked. I have compiled a list of lines which contain <?> and/or @ and haven't been post-processed yet. ("。</s><s><unsure/>" means that the period is unclear. And one artifact sentence "。</s><s><unsure/>.</s>") 141 * The <desc> and <var> in figures have not been used very consistently. (They did not use <cap> at all, but I only know of one figure where it would make sense to use it.) 142 * Figures and note have no place attribute. 143 * No <num>, <var> (outside of figures), <ptr>, no IDs yet. 144 * The parts of problems with more than one part have not been encoded yet. 145 * Four books, i.e. four titles, one in the front (attribute n=1) and three in the body (n=1, 2, 3).