wiki:WO3_Euclid_1966

Context Navigation

Version 11 (modified by Wolfgang Schmidle, 16 years ago) (diff)
--

Euclid 1966

Euclid (1966), Ji he yuan ben = Jihe yuanben ECHO

Part of WO3, or rather a special work order on its own. Sent with DESpecs version 1.1.2.

Sent: ok/date. Returned: 2009-01-05, raw text. Came with an unknown characters list (<001> and <002>; see attachment "char code.pdf").

1. First Analysis

Difficulties

The only text to Formax in Chinese

Special Instructions

see attachment

2. Questions From Formax

Question

Questions concerning figures and tables, see attachments "email.doc" and "berlin.doc".

Reply

The questions were answered in a few stages, which are not relevant. For the final instructions see "Euclid 1966.txt".

Final Instructions

Final version of the additional instructions for figures and tables: See attachment.

3. Analysis of the Result

We had requested a short sample and were content with the result.

Findings

Unknown characters: <001> appears once in the text and hasn't been included in "ques.xls". It is a badly printed 轂 (U+8F42). It makes sense that they marked it as an unknown character because only by taking the context into account it becomes clear which character is meant.

<002> appears three times and has already appeared in "ques.xls". In "ques.pdf" we told them to transcribe it as U+2F88D. Although this Unicode code point is not included in most Unicode fonts (it is included in Sun-ExtB), we should make it clear for future work orders that they should find every character in Unicode 5.1.0. (Email from 2009-01-05: "There are Chinese characters that we cannot key, and we use <001> and <002> instead of them. We cannot key the word with code 2f88d, so we use <002> instead of it. We have installed the Unicode font, but we cannot find the CJK compatibility ideographs supplement (u+2F800-U+2FA1F) word, can you help us?")

Recommendation

4. Post-Processing

The first XML version of the text is here.

Emendations of the raw text

# normalize the zero in modern page numbers
s!○!〇!g;  # white circle U+25CB --> ideographic number zero U+3007

# ignore outdentation in the preface
if ($line < 153) { s!<p x>!<p>!; } 

# pre-process notes that continue on the next line	

# fill in the unknown characters (MSi)
s!<001>!轂!g;  # s!<001>!<unknown code="001" unicode="8F42">轂</unknown>!g;
s!<002>!<unknown code="002" unicode="2F88D">庶</unknown>!g; # the actual Unicode character 庶 breaks oXygen

# clarify <?> (the list is not complete!)
s!<？>!<?>!; # line 811: fullwidth question mark U+FF1F --> ASCII question mark U+003F
s!愈<\?>!愈!g; # MSi: the reading is correct
s!丙、等。<?>而戊丙丁、與甲乙丙、又等。!丙、等。而戊丙丁、與甲乙丙、又等。!; # line 1041
# (line 1041: MSi: It is in the middle of a sentence, but a period at this position is quite common nonetheless.) 

# missing line breaks (the list is not complete!)
s!小於兩直角。則此二橫直線。!小於兩直角。則此二橫<lb/>直線。!; # line 403; may have to do with the neighboring figure
s!俱小於直角。或幷之小於兩直角。!俱小於直角。或幷之小<lb/>於兩直角。!; # line 404

# normalize the hash in the table
s!＃!#!g; # fullwidth number sign U+FF03 --> ASCII hash, i.e. number sign U+0023

# move the only table in the text (ECHO p.327) out of its surrounding sentence
s!却云十六與十二之比例。若!却云十六與十二之比例。!;  # line 4562
s!八與三、及二與四之比例。!若<lb/>八與三、及二與四之比例。!; #line 4573

# misc. emendations
s!N12<114608657010!N12x114608657010!; # line 5: replace "<" in library stamp junk
s!<pb 六><h>幾何原本　卷一之首</h>!<pb 六><rh>幾何原本　卷一之首</rh>!;  # line 245 (obvious mistake)
s!<h>後支前己正論</h>!<p>後支前己正論</p>!;  # line 2175 (Tian Miao: wrong tag)
if ($line == 2992) { s!<h>第三十四題</h>!<h>第十四題</h>!; }  # line 2992 (obvious mistake)

Attachments (8)

DESpecs_special_Euclid_1966_2.pdf (902.7 KB) - added by Wolfgang Schmidle 16 years ago.
email.doc (1.1 MB) - added by Wolfgang Schmidle 16 years ago.
berlin.doc (756.0 KB) - added by Wolfgang Schmidle 16 years ago.
Ques.pdf (46.5 KB) - added by Wolfgang Schmidle 16 years ago.
Char code.pdf (16.9 KB) - added by Wolfgang Schmidle 16 years ago. originally as xls file
Euclid 1966.txt (2.4 KB) - added by Wolfgang Schmidle 16 years ago.
Euclid_1966.pl (13.4 KB) - added by Wolfgang Schmidle 16 years ago.
figures_euclid_1966.html (384.1 KB) - added by Klaus Thoden 14 years ago. XQL result for echo:figure as HTML

Download in other formats:

Plain Text