wiki:2009-11-23

XML meeting, November 23rd, 2009

  • Present: JB, RC, PD, JK, FJK, MS, WS, KT, JW, DW
  • Protocol: KT
  • Language technology
    • Greek
      • JW: customized two scripts for Greek to transcode from Betacode to Unicode
        • Macron problem still there
        • WS: Capital letters problem solved
    • Arabic
      • JW: first used Aramorph system
      • found an Extended Buckwalter on the net
        • WS: Extended Buckwalter is for a Qur'an project, not sure, if applicable
      • Hyphen problem
        • MS: hyphens are Buckwalter artefacts: should not be displayed (but not deleted either)
        • PD: hyphens come from filemaker file (used by transcriber to represent blanks)
      • JB: are there other characters to be taken care of?
        • JW: yes, works on it with MS
      • DW: Writing direction can be set in XHTML
  • Normalizing
    • RC: please document normalizing steps
    • DW: make overview of the architecture
  • Work on Liddell Scott Jones
    • WS: final sigma is wrong in some places
      • possible error source: new Lex script
    • Problems in transcoding should have been solved already by Perseus
  • Priorities
    • 1. edo/ sum-problem
    • 2. Validation
    • 3. role of eXist repository
    • 4. new eXist version
    • 5. eSciDoc
    • 6. Transcoding issues
  • DTD/ RNG
    • WS: implicit validation only through Schema and DTD
      • explicit validation through RNG in eXist using Jing
    • How to treat the DTD fragment
      • DTD fragment supports validation and saves 10% of the documents' size
      • Possible solution: separate validation and display of XML
      • DW: in general, entities are difficult in XML, it is better to resolve them
        • DTD fragment generates errors in a big number of tools
      • WS: resolving makes xml file harder to read and edit
      • to have xml file in eXist without fragment, a simple XSLT script is sufficient
      • WS: nice to have conversion back to version with fragment
        • can be done via script
  • Disambiguation problem
    • Highest priority
      • Possible solution: introduce hyperlemma
  • Pollux: change to new dictionaries?
    • JB: updates are not only corrected but have also different structure
      • dicitionaries would have to be transformed
    • DW: why do we not use the LSJ hosted at Perseus?
      • JW: we do not know if the corresponding entries are there
        • Perseus is slow
  • Abbreviations
    • WS: two different kinds of abbreviations
      • book specific and commonly used ones
      • should offer service
      • PD: make enhancement of morphological server: offer also abbreviation resolver
        • similar to docspecs
      • schedule a meeting for that
        • WS collects material
      • PD: WS solves these problems for the third time
        • another task for the IT archaeological meeting
    • Also missing: MDHs article on linguistic middleware
Last modified 14 years ago Last modified on Nov 26, 2009, 1:25:48 PM

Attachments (1)

Download all attachments as: .zip