Changes between Version 107 and Version 108 of WikiStart


Ignore:
Timestamp:
May 4, 2011, 2:35:31 PM (14 years ago)
Author:
Klaus Thoden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WikiStart

    v107 v108  
    4242
    4343
    44 == 1. Data Entry Specs WG ==
    45 
    46 === Existing data entry specs ===
    47 
    48 Some preliminary notes on character issues are to be found on the page CharacterIssues. Some old DE specs may be found under LegacySpecs. Raw data entry versions of Archimedes texts can be accessed through the [http://archimedes.mpiwg-berlin.mpg.de/cvs-web/read/cvswebread.cgi/texts/archimedes/raw/ WebCVS interface].
    49 
    50 Malcolm's [wiki:HighLevelRequirements high level requirements] from the large team meeting on 2008-09-11.
    51 
    52 === Text workflow ===
    53 
    54 Our standards for digital images from external vendors can be found [wiki:"Image standards" here].
    55 DFG Practical Guidelines on Digitisation:
    56 [http://www.dfg.de/forschungsfoerderung/wissenschaftliche_infrastruktur/lis/download/praxisregeln_digitalisierung.pdf German], [http://www.dfg.de/forschungsfoerderung/wissenschaftliche_infrastruktur/lis/download/praxisregeln_digitalisierung_en.pdf English].
    57 
    58 Some [wiki:SampleTexts Sample texts] from the ECHO collection. Problems of and requests for ECHO see [wiki:EchoRemarks here].
    59 
    60 ["Provisional list"] of books to be transcribed: Batch 1 and 2 from this list have been sent to China. ["First evaluation"] of the work sample.
    61 
    62 The next batch of books can be found [http://fm8-server.mpiwg-berlin.mpg.de/fmi/iwp/res/iwp_home.html here] (click on "Digitalisierung"). An overview of the books possibly included in the next batch can be found [wiki:"Intermediate Batch 6" here]
    63 
    64 Some [wiki:FormaxQueries letters] from [http://www.formax.com.cn/ Formax] have been copied to the wiki for further discussion
    65 
    66 [wiki:OverviewWorkOrders2008 Overview] of the five Work Orders to Formax in 2008.
    67 
    68 [wiki:"Regex from Alvarus" Collection] of Regular Expressions for replacing abbreviations.
    69 
    70 === References on encoding ===
    71 
    72   * [http://www.cs.tut.fi/~jkorpela/chars.html A tutorial on character code issues] (by J. Korpela)
    73   * [http://www.unicode.org/ Unicode Home Page]
    74     * [http://www.unicode.org/charts/ Code Charts By Script (Unicode 5.1)]
    75     * [http://www.unicode.org/reports/tr15/ Unicode Standard Annex #15: UNICODE NORMALIZATION FORMS]
    76   * [http://proquest.safaribooksonline.com/9780596102425/dedication Fonts & Encodings] (O'Reilly book by Y. Haralambous, English ed.)
    77   * [http://search.cpan.org/~sadahiro/Unicode-Normalize-1.03/Normalize.pm Documentation for Unicode::Normalize Perl module] (from [http://www.cpan.org/ CPAN])
    78   * [http://www.w3.org/2003/entities/iso8879doc/overview.html ISO 8879 entities] (from W3C)
    79   * [http://www.tei-c.org  The Text Encoding Initiative (TEI)]
    80   * [http://www.tlg.uci.edu/BetaCode.html Beta Code]
    81   * [http://www.geonames.de/codlang.html#script ISO 15924 script codes]
    82 
    8344=== Additional resources ===
    8445
     
    8748On abbreviations: [http://libcat.mpiwg-berlin.mpg.de/bibsys/FMPro?-db=bibliothekskatalog_mpiwg&-lay=cgi&All_fields=abbreviatu&-format=record.html&-max=1&-find= Lexicon abbreviaturarum] by Adriano Cappelli
    8849
    89 === Completed specs ===
    9050
    91 See [wiki:DataEntrySpecs here]. There is also a [wiki:"Images in DESpecs" list] which shows what images were used in the examples.
    92 
    93 === Results from China ===
    94 
    95 See [http://pythia.mpiwg-berlin.mpg.de/department1/mpdl/raw-texts here].
    96 
    97 == 2. Document Schema WG ==
    98 
    99 Malcolm's [wiki:SchemaHighLevelRequirements Schema high level requirements] from the large team meeting on 2008-09-18.
    100 
    101 Two documents that will serve as starting points for the Document Schema can be found [attachment:echo_V1.xml here] and [attachment:ECHO00001A2B3CX_V2.xml here].
    102 
    103 Discussion of the [wiki:Metadata].
    104 
    105 === References ===
    106 
    107 
    108   * Relax NG
    109     * [http://proquest.safaribooksonline.com/0596004214/relax-PREFACE-2 Relax NG] (O'Reilly book by E. van der Vlist)
    110      * [http://books.xmlschemata.org/relaxng/ The GFDL release] of this book, along with updates (html)
    111     * [http://www.thaiopensource.com/relaxng/trang.html trang] (open source schema converter written in Java)
    112     * [http://relaxng.org/compact-tutorial-20030326.html RELAX NG Compact Syntax Tutorial]
    113     * a [attachment:schema.tar.gz zipped tarball] with schemas for some well-known document types
    114     * [http://enlil.museum.upenn.edu/cdl/doc/XDF XDF] : XML Documentation Format (literate programming with Relax NG)
    115   * [http://dublincore.org/ Dublin Core (Metadata)]
    116   * [http://www.loc.gov/standards/iso639-2/ ISO 639-2] (codes for natural languages)
    117   * [http://www.w3.org/TR/NOTE-datetime ISO 8601] (date and time formats; brief reference from W3C)
    118 
    119 === Tools ===
    120 
    121   * [http://xmlstar.sourceforge.net/ XMLStarlet] Command Line XML Toolkit
    122   * [http://www.xmlsoft.org/ libxml2] contains ''xmllint'', a command line tool for validating
    123   * [http://www.id.cbs.dk/~dh/corpus/tools/MXTERMINATOR.html MXTerminator] A tool for sentence boundary detection
    12451
    12552== Documentation about trac ==