Context Navigation

Changes between Version 1 and Version 2 of tmp/projectgoals

Timestamp:: Feb 2, 2011, 3:05:39 PM (13 years ago)
Author:: jwillenborg
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

tmp/projectgoals

-                      v1
+                      v2
 == Reached project goals (Content-based Web Access) ==
+Requirements (extracted from [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/raw-attachment/wiki/WikiStart/MPDL_project_desc.pdf project description]):
+=== Requirements (extracted from [https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content/raw-attachment/wiki/WikiStart/MPDL_project_desc.pdf project description]) ===
 * Such an environment must offer content-based access to the texts, which includes sophisticated search capabilities that depend in part on natural language processing (NLP).
+* (1) Such an environment must offer content-based access to the texts, which includes sophisticated search capabilities that depend in part on natural language processing (NLP).
 * The current technical infrastructure of ECHO is inadequate for indefinitely maintaining a (growing) collection of this size. Within the MPDL framework we intend to pilot a replacement architecture for ECHO as well as to prepare a migration path for the ECHO content.
+* (2) The current technical infrastructure of ECHO is inadequate for indefinitely maintaining a (growing) collection of this size. Within the MPDL framework we intend to pilot a replacement architecture for ECHO as well as to prepare a migration path for the ECHO content.
 * Present formatted pages of the XML transcription in parallel with digital page images. Digilib will play the primary role in the disseminating the latter. Here a basic subservice will extract a given page from an XML fulltext and provide a balanced (or “symmetrized”) version. Such a subservice is needed, since the XML between two page break milestones usually is not a well-formed XML fragment without further processing.
+* (3) Present formatted pages of the XML transcription in parallel with digital page images. Digilib will play the primary role in the disseminating the latter. Here a basic subservice will extract a given page from an XML fulltext and provide a balanced (or “symmetrized”) version. Such a subservice is needed, since the XML between two page break milestones usually is not a well-formed XML fragment without further processing.
 * The system is designed to support multiple XML vocabularies which will require minimal configuration information—such as the TEI document type or ECHO document type.
+* (4) The system is designed to support multiple XML vocabularies which will require minimal configuration information—such as the TEI document type or ECHO document type.
 * Subsequent to extraction and production of a balanced XML fragment, the display pipeline involves the following three major steps:
+* (5) Subsequent to extraction and production of a balanced XML fragment, the display pipeline involves the following three major steps:
   * Rendering. Rendering of the balanced XML fragment will be performed with XSLT on the server side, yielding XHTML for the client. XSLT will be readily pluggable, allowing for multiple output options.
   * Enrichment. The generated XHTML will be enriched with: inline images, links to external resources (e.g. Pollux dictionaries via lemmatization provided by Donatus; geospatial data). At this stage, transliteration of various sorts is also possible (should a Greek text be displayed in a Romanization or in Greek characters? should an Arabic text be displayed fully voweled, in its typical rendition, or in Romanization? should a Sanskrit text be displayed in Devanagari, or Tamil, or Romanization, or IPA? a Chinese text in traditional characters, simplified characters, or pinyin?). This is also the layer at which named entity resolution is most appropriately realized.
   * Generation of a synthetic view. The XHTML view will be synchronized and presented in coordination with the appropriate digital image, provided by Digilib. In addition to the basic display environment, a language-sensitive indexing tool needs to be constructed. Such a tool will allow searching a particular text, a corpus, an arbitrarily selected group of texts/corpora, or all texts for one or more natural language words. The search functionality will be developed using an open-source tool (e.g. Lucene) in combination with the NLP technology hosted by Donatus. Thus, for instance, it will be possible to search for all inflected forms of a Latin verb (or only a subset of those forms).
 * There will also be support for accessing texts through human-constructed indices, which reference the texts through XPointer. In this way, scholars will be able to develop an access approach to a given text.
+* (6) There will also be support for accessing texts through human-constructed indices, which reference the texts through XPointer. In this way, scholars will be able to develop an access approach to a given text.
 * A further component of this project is to extend the Arboreal browser to be able to make use (both read/write) of the MPDL repository. This extension will provide scholars with an alternative/complementary access modality. In addition, Arboreal, which is an inherently network-neutral application, will be able to offer storage within the MPDL repository as an alternative strategy for saving content generated within the program.
+* (7) A further component of this project is to extend the Arboreal browser to be able to make use (both read/write) of the MPDL repository. This extension will provide scholars with an alternative/complementary access modality. In addition, Arboreal, which is an inherently network-neutral application, will be able to offer storage within the MPDL repository as an alternative strategy for saving content generated within the program.
+* It is also our intention to integrate a general statistical toolkit currently under development by the Scholarly Computing Group of the MPIWG into this framework.
+* (8) It is also our intention to integrate a general statistical toolkit currently under development by the Scholarly Computing Group of the MPIWG into this framework.
+=== Reached progress ===
+* (1): fully reached
+* (2): fully reached
+* (3):