| 1 | = High level requirements for DE Specs = |
| 2 | |
| 3 | == general principles == |
| 4 | * must specifiy standard file encoding and unambiguous conventions for entry of non-ASCII characters |
| 5 | * a convention is needed for DE personnel to indicate and record unknown characters |
| 6 | * character entry conventions must be ergonomic and within capabilities of DE firm |
| 7 | * DE output must be plain text, but will not be well-formed XML |
| 8 | * DE markup should be concise and unambiguous |
| 9 | * DE markup should facilitate conversion to target structured XML document |
| 10 | |
| 11 | == required structural features == |
| 12 | * conventions are needed for standard line, paragraph, and page-level structure |
| 13 | * markup needs to indicate not only where a feature starts, but also where it ends, unless automatic inference of the end location is trivial |
| 14 | * must address headers/footers, notes (marginal, foot- and end-), tables, and lists, and figures |
| 15 | * must support multi-column layouts |
| 16 | * must indicate relation of text to commentary, where these are presented on the page together |
| 17 | * must indicate emphasis (e.g. italics) |
| 18 | * must indicate change of typestyle, where this is semantically significant |
| 19 | * conventions for abbreviations |
| 20 | |
| 21 | == expository aspects == |
| 22 | * conventions should be indicated in numbered sections |
| 23 | * language needs to be kept simple and readable for Chinese employees |
| 24 | * complex structural features should be illustrated with an example (or examples) from actual texts and desired transcription |
| 25 | |
| 26 | == coverage == |
| 27 | * DE is not appropriate where OCR would be more cost-effective |
| 28 | * material needed by the Institute's scientists in the proximate future should be accommodated |
| 29 | * version targets |
| 30 | * DE Specs 1.0 should cover printed European books up to the nineteenth century |
| 31 | * DE Specs 1.1 should add support for Chinese books |
| 32 | * DE Specs 2.0 should cover also transcriptions made by students or other personnel of annotated matter or manuscripts |
| 33 | * out of scope for DE Specs 1.0-2.0 |
| 34 | * specialized document types such as dictionaries |
| 35 | * dramatic and verse literature |
| 36 | * complex formal language content (e.g. mathematics, chemical formulae, musical notation) |
| 37 | * documents such as notebooks, personal letters, and financial documents |
| 38 | * twentieth-century material (perhaps with certain exceptions) |