Changes between Version 1 and Version 2 of tmp/projectgoals


Ignore:
Timestamp:
Feb 2, 2011, 3:05:39 PM (13 years ago)
Author:
jwillenborg
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • tmp/projectgoals

    v1 v2  
    11== Reached project goals (Content-based Web Access) ==
    22
    3 Requirements (extracted from [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/raw-attachment/wiki/WikiStart/MPDL_project_desc.pdf project description]):
     3=== Requirements (extracted from [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/raw-attachment/wiki/WikiStart/MPDL_project_desc.pdf project description]) ===
    44
    5 * Such an environment must offer content-based access to the texts, which includes sophisticated search capabilities that depend in part on natural language processing (NLP).
     5* (1) Such an environment must offer content-based access to the texts, which includes sophisticated search capabilities that depend in part on natural language processing (NLP).
    66
    7 * The current technical infrastructure of ECHO is inadequate for indefinitely maintaining a (growing) collection of this size. Within the MPDL framework we intend to pilot a replacement architecture for ECHO as well as to prepare a migration path for the ECHO content.
     7* (2) The current technical infrastructure of ECHO is inadequate for indefinitely maintaining a (growing) collection of this size. Within the MPDL framework we intend to pilot a replacement architecture for ECHO as well as to prepare a migration path for the ECHO content.
    88
    9 * Present formatted pages of the XML transcription in parallel with digital page images. Digilib will play the primary role in the disseminating the latter. Here a basic subservice will extract a given page from an XML fulltext and provide a balanced (or “symmetrized”) version. Such a subservice is needed, since the XML between two page break milestones usually is not a well-formed XML fragment without further processing.
     9* (3) Present formatted pages of the XML transcription in parallel with digital page images. Digilib will play the primary role in the disseminating the latter. Here a basic subservice will extract a given page from an XML fulltext and provide a balanced (or “symmetrized”) version. Such a subservice is needed, since the XML between two page break milestones usually is not a well-formed XML fragment without further processing.
    1010
    11 * The system is designed to support multiple XML vocabularies which will require minimal configuration information—such as the TEI document type or ECHO document type.
     11* (4) The system is designed to support multiple XML vocabularies which will require minimal configuration information—such as the TEI document type or ECHO document type.
    1212
    13 * Subsequent to extraction and production of a balanced XML fragment, the display pipeline involves the following three major steps:
     13* (5) Subsequent to extraction and production of a balanced XML fragment, the display pipeline involves the following three major steps:
    1414  * Rendering. Rendering of the balanced XML fragment will be performed with XSLT on the server side, yielding XHTML for the client. XSLT will be readily pluggable, allowing for multiple output options.
    1515  * Enrichment. The generated XHTML will be enriched with: inline images, links to external resources (e.g. Pollux dictionaries via lemmatization provided by Donatus; geospatial data). At this stage, transliteration of various sorts is also possible (should a Greek text be displayed in a Romanization or in Greek characters? should an Arabic text be displayed fully voweled, in its typical rendition, or in Romanization? should a Sanskrit text be displayed in Devanagari, or Tamil, or Romanization, or IPA? a Chinese text in traditional characters, simplified characters, or pinyin?). This is also the layer at which named entity resolution is most appropriately realized.
    1616  * Generation of a synthetic view. The XHTML view will be synchronized and presented in coordination with the appropriate digital image, provided by Digilib. In addition to the basic display environment, a language-sensitive indexing tool needs to be constructed. Such a tool will allow searching a particular text, a corpus, an arbitrarily selected group of texts/corpora, or all texts for one or more natural language words. The search functionality will be developed using an open-source tool (e.g. Lucene) in combination with the NLP technology hosted by Donatus. Thus, for instance, it will be possible to search for all inflected forms of a Latin verb (or only a subset of those forms).
    1717
    18 * There will also be support for accessing texts through human-constructed indices, which reference the texts through XPointer. In this way, scholars will be able to develop an access approach to a given text.
     18* (6) There will also be support for accessing texts through human-constructed indices, which reference the texts through XPointer. In this way, scholars will be able to develop an access approach to a given text.
    1919
    20 * A further component of this project is to extend the Arboreal browser to be able to make use (both read/write) of the MPDL repository. This extension will provide scholars with an alternative/complementary access modality. In addition, Arboreal, which is an inherently network-neutral application, will be able to offer storage within the MPDL repository as an alternative strategy for saving content generated within the program.
     20* (7) A further component of this project is to extend the Arboreal browser to be able to make use (both read/write) of the MPDL repository. This extension will provide scholars with an alternative/complementary access modality. In addition, Arboreal, which is an inherently network-neutral application, will be able to offer storage within the MPDL repository as an alternative strategy for saving content generated within the program.
    2121
    22 * It is also our intention to integrate a general statistical toolkit currently under development by the Scholarly Computing Group of the MPIWG into this framework.
     22* (8) It is also our intention to integrate a general statistical toolkit currently under development by the Scholarly Computing Group of the MPIWG into this framework.
     23
     24=== Reached progress ===
     25
     26* (1): fully reached
     27
     28* (2): fully reached
     29
     30* (3):