Version 1 (modified by 15 years ago) (diff) | ,
---|
XML meeting, November 23rd, 2009
- Present: JB, RC, PD, JK, FJK, MS, WS, KT, JW, DW
- Protocol: KT
- Language technology
- Greek
- JW: customized two scripts for Greek to transcode from Betacode to Unicode
- Macron problem still there
- WS: Capital letters problem solved
- JW: customized two scripts for Greek to transcode from Betacode to Unicode
- Arabic
- JW: first used Aramorph system
- found an Extended Buckwalter on the net
- WS: Extended Buckwalter is for a Qur'an project, not sure, if applicable
- Hyphen problem
- MS: hyphens are Buckwalter artefacts: should not be displayed (but not deleted either)
- PD: hyphens come from filemaker file (used by transcriber to represent blanks)
- JB: are there other characters to be taken care of?
- JW: yes, works on it with MS
- DW: Writing direction can be set in XHTML
- Greek
- Normalizing
- RC: please document normalizing steps
- DW: make overview of the architecture
- Work on Liddell Scott Jones
- WS: final sigma is wrong in some places
- possible error source: new Lex script
- Problems in transcoding should have been solved already by Perseus
- WS: final sigma is wrong in some places
- Priorities
- 1. edo/ sum-problem
- 2. Validation
- 3. role of eXist repository
- 4. new eXist version
- 5. eSciDoc
- 6. Transcoding issues
- DTD/ RNG
- WS: implicit validation only through Schema and DTD
- explicit validation through RNG in eXist using Jing
- How to treat the DTD fragment
- DTD fragment supports validation and saves 10% of the documents' size
- Possible solution: separate validation and display of XML
- DW: in general, entities are difficult in XML, it is better to resolve them
- DTD fragment generates errors in a big number of tools
- WS: resolving makes xml file harder to read and edit
- to have xml file in eXist without fragment, a simple XSLT script is sufficient
- WS: nice to have conversion back to version with fragment
- can be done via script
- WS: implicit validation only through Schema and DTD
- Disambiguation problem
- Highest priority
- Possible solution: introduce hyperlemma
- Highest priority
- Pollux: change to new dictionaries?
- JB: updates are not only corrected but have also different structure
- dicitionaries would have to be transformed
- DW: why do we not use the LSJ hosted at Perseus?
- JW: we do not know if the corresponding entries are there
- Perseus is slow
- JW: we do not know if the corresponding entries are there
- JB: updates are not only corrected but have also different structure
- Abbreviations
- WS: two different kinds of abbreviations
- book specific and commonly used ones
- should offer service
- PD: make enhancement of morphological server: offer also abbreviation resolver
- similar to docspecs
- schedule a meeting for that
- WS collects material
- PD: WS solves these problems for the third time
- another task for the IT archaeological meeting
- Also missing: MDHs article on linguistic middleware
- WS: two different kinds of abbreviations
Attachments (1)
- proto-xml-2009-11-23.oo3.zip (3.9 KB) - added by 15 years ago.
Download all attachments as: .zip