wiki:CWOB Junqi zaji

Junqi zaji

Junqi zaji, ECHO

Part of Chin WO B. Sent with DESpecs for Chinese text version 1.2 (see here)

Sent: yes/2009-02-20. Returned: yes/2009-05-04 link

List of unknown characters in this document.

1. First Analysis

Difficulties

Special Instructions

p.3: circled characters "Mark circled characters by ( ), e.g. (甲)." (It seems that this Special Instruction got lost, however. It re-appears as the answer to one question, see below.)

2. Questions From Formax

Q1. If the three books contain out-dented paragraphs, could you please give us a sample about <p x>?

A: In these three books there are indeed no outdented paragraphs. (However, there are indented paragraphs, for example Jungqi zaji p.0009, line 3. In the Euclid text Jihe yuanben 幾何原本 there were outdented paragraphs.)

About Junqi zaji Q2. For this book, ics will be only used for the text in page 0001.jpg, i.e. <ti ics>軍器雜記</ti>. Please confirm.

A: Yes.

Q3. Please see 0005.jpg, 0006.jpg there are some characters with circle, such as 一,二,三,甲,乙,丙etc. Would you please advise that we should key them as (一),(二),(三),(甲),(乙),(丙), or use unknown characters instead of them?

A: Please type it as (一), etc.

Q4. Could you please confirm the markup below about figures in 0008.jpg?

01 <p>(二)純鋼造者<sm>如左第\\二圖</sm>其用在攻城砲得力而形式同前惟彈 殼甚薄內膛較大取其
02 多裝炸藥使至敵處崩炸極猛尤易催堅令敵驚怯以制勝則前砲亦可用之
03 <pb>
04 按開花彈論之乃係碰物開炸故凡初開砲時查敵距遠近必先用斯彈以試之
05 或攻城壘等用</p>
06 <fig>
07 <cap>第一啚</cap>
08 <desc>甲</desc>
09 <desc>乙</desc>
10 <desc>炸藥膛</desc>
11 <fig>
12 <cap>第二啚</cap>
13 <desc>炸藥膛</desc>
14 <h 2>(乙)子母彈</h>
15 <p>此種子彈 <sm>如左第\\二圖</sm>其彈殺係 [text omitted]</p>

02: missing </p>

04: missing <p i>

07-10: According to our Specs, this is correct. However, we would appreciate if you could put all variables in a single <var> </var> tag, just as in the Euclid text Jihe yuanben. It would then look like this:

<fig>
<cap>第一啚</cap>
<desc>炸藥膛</desc>
<var>甲乙</var>

11-13: okay

Q5. About heading, (1) Heading from TOC in main text will be keyed as <h 1>, please confirm.

A: In the TOC of Junqi zaji, mark the first line as <h> and do not mark the other lines at all. It should look like this:

<toc>
<h> ... </h>
...
...
...
</toc>

For the TOC of Taixi shiwu qiyuan please see the example in the Specs, page 8.

(2) If the paragraphs beginning with circled 甲,circled乙,and circled 丙 etc. have the same indention with normal paragraphs we will mark them as <p> such as the kind of paragraphs in 0003.jpg, 0006.jpg, but if they further indent than the normal paragraphs, we will mark them as <h 2> such as the kind of paragraphs in 0007.jpg, 0008.j0009.jpg, but circled 甲,circled乙,and circled丙 in 0021.jpg will also be marked by <p>. Is it right? Please confirm markup below.

Markup samples for <p>

0003.jpg
<p>(甲)小粒黑藥</p>
<p>(乙)大粒黑藥</p>

0006.jpg
<p>(甲)開花彈</p>
<p>(乙)子母彈</p>

0021.jpg
<p i>(甲)拉火</p>
<p i>(乙)擊火</p>
<p i>(丙)電火</p>

Markup samples for <h 2>

0007.jpg
<h 2>(甲)開花彈</h>

0008.jpg
<h 2>(乙)子母彈</h>

A:

0003, 0006: okay

0021: okay

0007, 0008: please use <p i> instead of <h 2>

Q6. Please see paragraphs beginning with circled 一,二, 三etc. in 0004.jpg, 0005.jpg, 0007.jpg, 0013.jpg, 0016.jpg, and 0032.jpg etc. Could you please confirm they should be marked by <p>, or <list>?

A: Some lines beginning with circled characters could indeed be interpreted as list items. However, since most of these lines are relatively long, we would like you to use <p> for all these lines (or <p i>, of course).

Q7. Please see the attached Codes.pdf, column 2 is the source characters while column 3 is the corresponding characters that we want to key.

(1) Could you please confirm if lines 1-8 and 10-12 are correct?

(2) For Line 9, should we key this character as 隷, 隸, or unknown character i.e.<001>?

A: How to proceed with character variants:

As always, we would like you to provide us with plain text files in Unicode UTF-8 encoding. We wish the texts to be transcribed making use of the full character repertoire of Unicode 5.1. That means, if a variant is encoded as a separate Unicode character (with a unique Unicode codepoint), we wish the variant to be encoded in the transcribed text by the corresponding Unicode character.

If Unicode 5.1 does not provide a distinct codepoint for a variant character, please assign an unknown character code and provide us with the standard variant in the list of unknown characters.

In an e-mail about the Euclid text Jihe yuanben 幾何原本, you said that you cannot type the Unicode character U+2F88D in the CJK Compatibility Ideographs Supplement block (U+2F800 - U+2FA1F) and used <002> instead. The font Sun-ExtB should cover this Unicode block, but some applications may have problems with Unicode characters above U+FFFF. Please tell us if your problems persist.

Taken together, we want you to do this:

  1. Please use Sun-ExtA and Sun-ExtB if possible.

2a. If a character variant exists as a reference glyph with unique codepoint in Unicode 5.1, type it.

2b. If the character variation does not exist in Unicode 5.1, assign an unknown character code and provide us with the standard variant in the list of unknown characters.

Regarding the characters in Codes.pdf:

  1. OK
  1. <001> unknown characters list: (砲)
  1. OK (assuming that it is a slip of the pen)
  1. <002> unknown characters list: (飾)
  1. <003> unknown characters list: (絨)
  1. <004> unknown characters list: (墺 U+58BA)
  1. <005> unknown characters list: (曜)
  1. <006> unknown characters list: (紀)
  1. 𨽻 (U+28F7B); it's a variant of 隸, but has a unique codepoint)
  1. 痲 (U+75F2), not 麻 (U+75F3)
  1. 神 (U+FA19), not 神 (U+795E)
  1. We cannot identify the character from the image. Please provide us either with a better image or with the name of the text (Junqi zaji?), the page number and the line number.

Additional Notes from Formax

  1. We used <?> in the data for some unsure characters.
  2. The text on the 0033.jpg and the text on the 0034.jpg should be interchanged. Now the order we keyed for the two pages is:

After keying 0032.jpg, we keyed

  1. the text on 0034.jpg,
  2. the figure on 0033.jpg,
  3. the text on 0033.jpg
  4. the figure on 0034.jpg

Also there is a missing page before the text on 0033.jpg. If you need us to key the missing page please send us it.

3. Analysis of the Result

Findings

Recommendation

Last modified 14 years ago Last modified on Jul 12, 2010, 10:14:10 AM

Attachments (2)

Download all attachments as: .zip