wiki:CWOB Taixi shiwu qiyuan

Version 11 (modified by Wolfgang Schmidle, 14 years ago) (diff)

--

Taixi shiwu qiyuan

Taixi shiwu qiyuan, ECHO

Part of Chin WO B. Sent with DESpecs for Chinese text version 1.2 (see here)

Sent: yes/2009-02-20. Returned: 2009-04-17: first version (obsolete), resent 2009-04-29:second version

(note: The problem of marking character variants still needs to be discussed)

List of unknown characters in this document.

1. First Analysis

Difficulties

Special Instructions

2. Questions From Formax

Q1: 0001.jpg (cover page), 0004.jpg (blank page) the two pages will be ignored, and we will key this book from 0002.jpg, is it right?

A1: Please type the characters on page 1. Mark them as <ti> with individualized character style, i.e. use <ti ics>. Page 4 is indeed empty. Please type a <pb> tag on a separate line for every empty page.

Q2: Could you please confirm the markup below for 0002.jpg and 0003.jpg? (Note: we used <sm>for text with small font in 0002.jpg)

01 <pb>
02 <h>欽命二品頂戴江南分巡蘇松太兵備道袁 爲</h>
03 <p><sm>給示論禁事本年二月十二日接</sm></p>
04 <p><sm>英總領事霍 來函以香港人馮鏡如在上海開設廣智書局繙譯西書刋印出 售請出禁示止翻
05 刻印售並行縣廨一體示禁附具切結聲明局中刋刻各書均係自譯之本等情函致到 道除分行
06 縣委隨時查禁外(text omitted)
07 情弊定行提究不貸其各凜遵毋達切切特示</sm></p>
08 <p>光緒二十八年 三月 初二 日示</p>
09 <h>欽加三品銜賞戴花翎在任候選道特授江蘇上海縣正堂汪 爲</h>
10 <p><sm>出示論禁事奉</sm></p>
11 <p><sm>道憲 札接</sm></p>
12 <p><sm>英總領事霍 來函以香港人馮鏡如在上海開設廣智(text omitted)
13 刻印售並行縣廨一體示禁等由到道札縣示禁等因到縣奉此合行出示禁</sm>爲 <sm>此示仰書業人等知悉嗣後不准將廣智書局刋印各種新書翻刻出售如敢故達定干 查究其各凜遵切特
14 示</sm></p>
15 <p>光緒二十八年 三月 十七 日示</p>
16 <pb>
17 <p ics>日本澁江保編纂
18 泰西事物起原
19 廣智書局印</p>
20 <pb 一a><rh>泰西事物起原 目次</rh>
(text omitted) 

A2: We fully appreciate that the structure of page 2 is difficult to tag. However, the page does not contain normal-size text and small text, but large text and normal-size text. Please remove the <sm> tags on this page. Indicate paragraphs in large text (lines 08 and 15 on this page) by adding "lg" in the paragraph tag, i.e. <p lg>. Do not use "lg" in headings <h>, titles <ti> or paragraphs with individualized character style <p ics>. For the single large character in line 13 please use the new tag <lg> </lg>.

Some single remarks:

  • 07: 凜 (U+51DC) should be 凛 (U+51DB)
  • 07: 達 is a typo and should be 違
  • 13: line break is missing after 業
  • 11: The space is actually a large space and should be typed as two ideographic spaces (U+3000).
  • 13: same typo 達 as in line 07
  • 13: same wrong character 凜 as in line 07

A note on the large spaces on this page: In general, large spaces in <h> should be typed as a single ideographic space while large spaces in <p> should be typed as more than one ideographic space, according to its actual size. Since this would be very difficult in paragraphs with large text, your solution to type large spaces in the <p lg> paragraphs (lines 08 and 15) as single ideographic spaces is fine. However, please type the large space in line 11 as two ideographic spaces.

page 3: Your way of marking the page is acceptable. However, since these lines name the author, the book title and the publishing house, we would prefer

<ti ics>日本澁江保編纂</ti>
<ti ics>泰西事物起原</ti>
<ti ics>廣智書局印</ti> 

Q3: 0014.jpg and 0142.jpg will also be retained for keying, is it right? Such as:

<pb 五b><rh>泰西事物起原目次終</rh> 

A3: On both pages, the line of text is not the running head but normal text. page 14:

<pb 五b><rh>泰西事物起原 目次</rh>
<p>泰西事物起原目次終</p>
page 142:
<pb 六十四b><rh>泰西事物起原 第二十三章</rh>
<p>泰西事物起原終</p> 

Q4: Could you please confirm our markup below for headings before paragraphs in 0015.jpg?

<pb 一a><rh>泰西事物起原 第一章</rh>
<ti>泰西事物起原</ti>
<ti>日本 澁江保 編纂</ti>
<ti>上海 廣智書局同人 譯述</ti>
<h1>第一章 天時<h1>
<h2>日月</h2> 

A4: Only one minor point: Please mark heading levels like this:

<h 1>第一章 天時</h>
<h 2>日月</h> 

Q5: 0143.jpg the last page of this book will be keyed as follows:

<pb>
<p>光緒二十八年十一月二十日印刷</p>
<p>光緒二十八年十二月二十日發行</p>
<p>(定價大洋三角五分)</p>
<tb>
編纂者#日本澁江保
譯述者#上海廣智書局
#<sm>上海英界大馬路同樂里</sm>
印刷所#廣智書局活版部
#<sm>上海英界大馬路同樂里</sm>
總發行所#廣智書局
#<sm>上海英界四馬路東首</sm>
賣捌所#日本新民叢報支店
#<sm>泰西事物起原</sm>
</tb>

Is it right?

A5: Yes, that's fine.

Additional notes from Formax

  1. Except for IDEOGRAPHIC SPACE U+3000, we also used normal space U+0020 in paragraphs in order to display as source.
  2. Some <?> were used in the paragraphs for some unsure character.

3. Analysis of the Result

Findings

Recommendation

Attachments (2)

Download all attachments as: .zip