wiki:mpdl2.0-design

Version 4 (modified by jwillenborg, 13 years ago) (diff)

--

MPDL 2.0 (backend system)

The MPDL backend software is redesigned so that as much software as possible is separated from the eXist database system and could be used independently from eXist (MPDL 2.0). We offer two MPDL librares:

  • language technology (available as a web archive file "mpiwg-mpdl-lt.war")
  • XML technology (available as a web archive file "mpiwg-mpdl-xml.war")

The other MPDL backend software which needs the functionality of eXist and Lucene (XML document storage and retrieval) is redesigned so that all important functions are available as HTTP servlets with a specified API.

Language technology

The language technology web archive ""mpiwg-mpdl-lt.war"" consists of:

  • data (data files, Java Berkely DB's)
    • morphology data (Perseus, CELEX, Lexique with languages: ara, eng, fre, ger, gre, ita, lat, nld, zho)
    • dictionary data (dictionaries: autenrieth, baretti, bonitz, cooper, florio, lewis-short, lidell-scott-jones, salmone, webster)
  • Java source code
  • used Java libraries
  • web application configuration file ("web.xml")

Following HTTP servlets are available:

Morphology

  • TokenizeServlet
    • URL: /mpdl/tokenize
    • Request parameters:
      • srcUrl
        • source URL of fulltext
          • unstructured text
          • XML fragment/document
      • language
        • ISO 639-3 specifier
    • Response output:
      • word tokens
        • word tokens (XML)
  • LemmaServlet
    • URL: /mpdl/getLemmas
    • Request parameters:
      • forms
        • one word form (string)
        • list of word forms (XML)
      • language
        • ISO 639-3 specifier
      • normalization (optional; default: without normalization)
        • "reg", "norm", "reg norm"
    • Response output:
      • lemmas
        • one lemma
        • list of lemmas (XML)
  • FormServlet
    • URL: /mpdl/getForms
    • Request parameters:
      • lemmas
        • one lemma (string)
        • list of lemmas (XML)
      • language
        • ISO 639-3 specifier
      • normalization (optional; default: without normalization)
        • "reg", "norm", "reg norm"
    • Response output:
      • forms
        • list forms (XML)

Dictionary

  • WordServlet
    • URL: /mpdl/getDictionaryEntries
    • Request parameters:
      • forms or lemmas or range
        • one form or lemma (string)
        • list of forms or lemmas (XML)
        • range
          • entries beginning with, e.g. "a*"
          • entries from position x to y: e.g. "200-300"
      • inputType (optional; default: "form")
        • "form", "lemma"
      • dictionary (optional; default: all dictionaries)
        • dictionary name, e.g. "webster"
      • language (optional, default: all languages)
        • ISO 639-3 specifier
      • outputType (optional)
        • "full", "compact"
      • outputFormat (optional)
        • "xml", "html"
      • normalization (optional; default: without normalization)
        • "reg", "norm", "reg norm"
    • Response output:
      • dictionary entries
        • dictionary entries (XML/HTML format)
          • if result is big then it is devided into result pages
          • with external links
  • DictionaryEnrichServlet
    • URL: /mpdl/enrichByDictionary
    • Request parameters:
      • srcUrl
        • source URL of XML fragment/document
      • language (optional, default: use xml:lang in XML document if found else "eng")
        • ISO 639-3 specifier
      • stopElements (Optional, default: empty)
        • elements which should not be analyzed and enriched (e.g. "lb")
    • Response output:
      • enriched XML fragment/document
        • words of document are extended by links to dictionaries

Other functions

  • NormalizeServlet
    • URL: /mpdl/normalize
    • Request parameters:
      • srcUrl
        • source URL of XML fragment/document
      • method
        • method of normalization (e.g. "reg", "norm", "reg norm")
      • type
        • type of normalization (e.g. "display", "dictionary", "search")
    • Response output:
      • normalized XML fragment/document
  • TranscodeServlet
    • URL: /mpdl/transcode
    • Request parameters:
      • text
        • text to be transcoded (string)
      • srcEncoding
        • source encoding (e.g. betacode, buckwalter, unicode)
      • destEncoding
        • destination encoding (e.g. betacode, buckwalter, unicode)
    • Response output:
      • transcoded text

XML technology

The XML technology web archive ""mpiwg-mpdl-xml.war"" consists of:

  • Java source code
  • used Java libraries
  • web application configuration file ("web.xml")

Following HTTP servlets are available:

XPath/XQuery

  • TransformServlet
    • URL: /mpdl/transform
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • xslUrl
        • URL of XSL document
    • Response output:
      • transformed document (HTML, XML, etc.)
  • RenderServlet
    • URL: /mpdl/render
    • Request parameters:
      • srcUrl
        • source URL of XML document
    • Response output:
      • rendered document (PDF)
  • !XPathServlet
    • URL: /mpdl/xpath
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • xpath
        • xpath source code
    • Response output:
      • XPath result for that document
  • !XQueryServlet
    • URL: /mpdl/xquery
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • xquery
        • xquery source code
    • Response output:
      • XQuery result for that document
  • GetFragmentServlet
    • URL: /mpdl/getFragment
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • ms1Name
        • first milestone name, e.g. "pb"
      • ms1Position
        • first milestone position, e.g. 1
      • ms2Name
        • second milestone name, e.g. "pb"
      • ms2Position
        • second milestone position, e.g. 2
    • Response output:
      • XML fragment

XML document storage and retrieval

  • GetDocumentServlet
    • URL: /mpdl/getDoc
    • Request parameters:
      • docId
        • document identifier (e.g. "/echo/la/benedetti_1585.xml")
    • Response output:
      • document
  • DocumentOperationServlet
    • URL: /mpdl/documentOperation
    • Request parameters:
      • operation
        • "create", "update", "delete"
      • srcUrl
      • destFileName (optional: default: file name in URL)
        • destination file name, e.g. "benedetti_1585.xml"
      • destLanguage (optional, default: xml:lang in document or "eng")
        • destination language (ISO 639-3 specifier), e.g. "lat"
    • Response output:
      • job id of scheduled operation
  • QueryServlet
    • URL: /mpdl/query
    • Request parameters:
      • queryType (optional: default: "morphological normalized")
        • "exact", "morphological", "normalized"
      • query
        • attribute query (e.g. "author = 'Benedetti' and language = 'lat'")
        • fulltext query (e.g. "quantitas")
      • docbases (optional, default: all document bases)
        • document bases (e.g. "mpdl", "archimedes-project")
      • orderBy (optional)
        • order query result by fieldname: e.g. "author" or "score" (fulltext queries)
      • resultPageNumber (optional, default: 1)
        • query result hits: page number
      • resultPageSize (optional, default: 100)
        • query result hits: page size
    • Response output:
      • query result (XML format)
  • QueryDocumentServlet
    • URL: /mpdl/queryDoc
    • Request parameters:
      • docId
        • document identifier (e.g. "/echo/la/benedetti_1585.xml")
      • query
        • fulltext query (e.g. "quantitas")
        • morphological fulltext query (e.g. "quantitas")
      • resultPageNumber (optional, default: 1)
        • query result hits: page number
      • resultPageSize (optional, default: 100)
        • query result hits: page size
    • Response output:
      • query result (XML format)