High level requirements for DE Specs
general principles
- must specifiy standard file encoding and unambiguous conventions for entry of non-ASCII characters
- a convention is needed for DE personnel to indicate and record unknown characters
- character entry conventions must be ergonomic and within capabilities of DE firm
- DE output must be plain text, but will not be well-formed XML
- DE markup should be concise and unambiguous
- DE markup should facilitate conversion to target structured XML document
required structural features
- conventions are needed for standard line, paragraph, and page-level structure
- markup needs to indicate not only where a feature starts, but also where it ends, unless automatic inference of the end location is trivial
- must address headers/footers, notes (marginal, foot- and end-), tables, and lists, and figures
- must support multi-column layouts
- must indicate relation of text to commentary, where these are presented on the page together
- must indicate emphasis (e.g. italics)
- must indicate change of typestyle, where this is semantically significant
- conventions for abbreviations
expository aspects
- conventions should be indicated in numbered sections
- language needs to be kept simple and readable for Chinese employees
- complex structural features should be illustrated with an example (or examples) from actual texts and desired transcription
coverage
- DE is not appropriate where OCR would be more cost-effective
- material needed by the Institute's scientists in the proximate future should be accommodated
- version targets
- DE Specs 1.0 should cover printed European books up to the nineteenth century
- DE Specs 1.1 should add support for Chinese books
- DE Specs 2.0 should cover also transcriptions made by students or other personnel of annotated matter or manuscripts
- out of scope for DE Specs 1.0-2.0
- specialized document types such as dictionaries
- dramatic and verse literature
- complex formal language content (e.g. mathematics, chemical formulae, musical notation)
- documents such as notebooks, personal letters, and financial documents
- twentieth-century material (perhaps with certain exceptions)
Download in other formats: