wiki:mpdl2.0-design

Version 3 (modified by jwillenborg, 13 years ago) (diff)

--

MPDL 2.0

The MPDL backend software release 2.0 is redesigned so that important functions (language technology, XML functions) are available as web applications independent from the eXist software - usable as HTTP servlets and fully implemented in Java.

Language technology

The language technology module consists of:

  • language technology data (XML data files, Java Berkely DB's)
    • morphology data (Perseus, CELEX, Lexique with languages: ara, eng, fre, ger, gre, ita, lat, nld, zho)
    • dictionary data (dictionaries: autenrieth, baretti, bonitz, cooper, florio, lewis-short, lidell-scott-jones, salmone, webster)
  • Java source code
  • used Java libraries
  • web application configuration file (web.xml)

It is available as the web archive file "mpiwg-mpdl-lt.war".

Following servlets are available:

Morphology

  • TokenizeServlet
    • URL: /mpdl/tokenize
    • Request parameters:
      • srcUrl
        • source URL of fulltext
          • unstructured text
          • XML fragment/document
      • language
        • ISO 639-3 specifier
    • Response output:
      • word tokens
        • word tokens (XML)
  • LemmaServlet
    • URL: /mpdl/getLemmas
    • Request parameters:
      • forms
        • one word form (string)
        • list of word forms (XML)
      • language
        • ISO 639-3 specifier
      • normalization (optional; default: without normalization)
        • "reg", "norm", "reg norm"
    • Response output:
      • lemmas
        • one lemma
        • list of lemmas (XML)
  • FormServlet
    • URL: /mpdl/getForms
    • Request parameters:
      • lemmas
        • one lemma (string)
        • list of lemmas (XML)
      • language
        • ISO 639-3 specifier
      • normalization (optional; default: without normalization)
        • "reg", "norm", "reg norm"
    • Response output:
      • forms
        • list forms (XML)

Dictionary

  • WordServlet
    • URL: /mpdl/getDictionaryEntries
    • Request parameters:
      • forms or lemmas
        • one form or lemma (string)
        • list of forms or lemmas (XML)
      • inputType (optional; default: "form")
        • "form", "lemma"
      • dictionary (optional; default: all dictionaries)
        • dictionary name (e.g. "webster")
      • language (optional, default: all languages)
        • ISO 639-3 specifier
      • outputType (optional)
        • full, compact
      • normalization (optional; default: without normalization)
        • "reg", "norm", "reg norm"
    • Response output:
      • dictionary entries
        • dictionary entries (XML)
  • DictionaryEnrichServlet
    • URL: /mpdl/enrichByDictionary
    • Request parameters:
      • srcUrl
        • source URL of XML fragment/document
      • language (optional, default: use xml:lang in XML document if found else "eng")
        • ISO 639-3 specifier
      • stopElements (Optional, default: empty)
        • elements which should not be analyzed and enriched (e.g. "lb")
    • Response output:
      • enriched XML fragment/document
        • words of document are extended by links to dictionaries

Other functions

  • NormalizeServlet
    • URL: /mpdl/normalize
    • Request parameters:
      • srcUrl
        • source URL of XML fragment/document
      • method
        • method of normalization (e.g. "reg", "norm", "reg norm")
      • type
        • type of normalization (e.g. "display", "dictionary", "search")
    • Response output:
      • normalized XML fragment/document
  • TranscodeServlet
    • URL: /mpdl/transcode
    • Request parameters:
      • text
        • text to be transcoded (string)
      • srcEncoding
        • source encoding (e.g. betacode, buckwalter, unicode)
      • destEncoding
        • destination encoding (e.g. betacode, buckwalter, unicode)
    • Response output:
      • transcoded text

XML technology

The XML technology module consists of:

  • Java source code
  • used Java libraries
  • web application configuration file (web.xml)

It is available as the web archive file "mpiwg-mpdl-xml.war".

Following servlets are available:

XPath/XQuery

  • TransformServlet
    • URL: /mpdl/transform
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • xslUrl
        • URL of XSL document
    • Response output:
      • transformed document (HTML, XML, etc.)
  • RenderServlet
    • URL: /mpdl/render
    • Request parameters:
      • srcUrl
        • source URL of XML document
    • Response output:
      • rendered document (PDF)
  • !XPathServlet
    • URL: /mpdl/xpath
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • xpath
        • xpath source code
    • Response output:
      • XPath result for that document
  • !XQueryServlet
    • URL: /mpdl/xquery
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • xquery
        • xquery source code
    • Response output:
      • XQuery result for that document
  • GetFragmentServlet
    • URL: /mpdl/getFragment
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • ms1Name
        • first milestone name, e.g. "pb"
      • ms1Position
        • first milestone position, e.g. 1
      • ms2Name
        • second milestone name, e.g. "pb"
      • ms2Position
        • second milestone position, e.g. 2
    • Response output:
      • XML fragment