wiki:mpdl2.0-design

MPDL 2.0 backend system: software design

The MPDL backend software is redesigned so that as much software as possible is separated from the content management system and could be used independently from it. We offer two MPDL librares:

  • XML technology (available as a basic Java library "mpiwg-mpdl-xml.jar" and as a web archive "mpiwg-mpdl-xml-web.war")
  • language technology (available as a basic Java library "mpiwg-mpdl-lt.jar" and as a web archive "mpiwg-mpdl-lt-web.war")

Dependent from the CMS we offer the MPDL CMS library:

  • CMS technology (available as a basic Java library "mpiwg-mpdl-cms.jar" and as a web archive "mpiwg-mpdl-cms-web.war")

XML technology

See the development state here.

The XML technology web archive ""mpiwg-mpdl-xml-web.war"" consists of:

  • Basic Java code: mpiwg-mpdl-xml.jar
    • de.mpg.mpiwg.berlin.mpdl.xml.transform
    • de.mpg.mpiwg.berlin.mpdl.xml.xquery
    • de.mpg.mpiwg.berlin.mpdl.xml.pdf
  • Web Java code: mpiwg-mpdl-xml-web.jar
    • de.mpg.mpiwg.berlin.mpdl.servlets.xml
  • external Java libraries
    • saxon9-s9api.jar
    • saxon9.jar
  • web application configuration file: web.xml

Following HTTP servlets are available:

XSL transformation

  • Url: /mpiwg-mpdl-xml-web/transform/Transform
    • Request parameters
      • srcUrl (required)
        • url of the Xml source document
      • xslUrl (required)
        • url of the Xsl document which does the transformation of the Xml document
      • parameters (optional)
        • parameters separated with blanks (e.g. "yourParam1=yourValue1 yourParam2=yourValue2")
        • default: no parameters
      • outputProperties (optional)
        • output properties separated with blanks (e.g. "encoding=utf-8 indent=yes")
          • "method=xhtml"
          • "indent=yes"
          • "media-type=text/html"
          • "encoding=utf-8"
          • default: "method=xml indent=yes media-type=text/xml encoding=utf-8"
    • Response output
      • transformed Xml document
  • Url: /mpiwg-mpdl-xml-web/transform/GetFragment
    • Request parameters:
      • docId
        • document identifier (e.g. "/echo/la/Benedetti_1585.xml")
      • ms1Name
        • first milestone name, e.g. "pb"
      • ms1Position
        • first milestone position, e.g. 1
      • ms2Name
        • second milestone name, e.g. "pb"
      • ms2Position
        • second milestone position, e.g. 2
    • Response output:
      • XML fragment between the two milestone elements

PDF rendering

  • Url: /mpiwg-mpdl-xml-web/pdf/Render
    • Request parameters:
      • srcUrl
        • source URL of XML document
      • xslUrl
        • URL of XSL document
      • parameters (optional)
        • list of parameters, e.g. "yourParam1=yourValue2 yourParam2=yourValue2"
    • Response output:
      • rendered document (PDF)

XPath/XQuery

  • Url: /mpiwg-mpdl-xml-web/xquery/XQuery
    • Request parameters:
      • inputString or srcUrl (required)
        • inputString
          • XML string
        • srcUrl
          • source URL of XML document
      • xquery (required)
        • XQuery (or XPath) source code which should be executed
    • Response output:
      • XQuery result

Language technology

See the development state here.

The language technology web archive ""mpiwg-mpdl-lt-web.war"" consists of:

  • data (data files, Java BerkelyDB data)
    • morphology data (Perseus, CELEX, Lexique with languages: ara, eng, fre, ger, gre, ita, lat, nld, zho)
    • dictionary data (dictionaries: autenrieth, baretti, bonitz, cooper, florio, lewis-short, lidell-scott-jones, salmone, webster)
  • Basic Java code: mpiwg-mpdl-lt.jar
    • de.mpg.mpiwg.berlin.mpdl.lt.*
  • Web Java code: mpiwg-mpdl-lt-web.jar
    • de.mpg.mpiwg.berlin.mpdl.servlets.lt.*
  • external Java libraries
    • berkeley-db-3.3.82.jar
    • ...
  • web application configuration file: web.xml

Following HTTP servlets are available:

Morphology

  • Url: /mpiwg-mpdl-lt-web/lt/GetLemmas
    • Request parameters:
      • query (required)
        • one form or lemma (e.g. "revolution") or
        • blank separated list of forms or lemmas (e.g. "revolution equality brotherliness")
      • inputType (optional)
        • "form"
        • "lemma"
        • default: "form"
      • language (optional)
        • ISO 639-3 specifier
        • default: "eng"
      • outputType (optional)
        • "compact"
        • "full"
        • default: "compact"
      • outputFormat (optional)
        • "html"
        • "xml"
        • "string" (lemma names separated by a blank)
        • default: "xml"
      • normalization (optional)
        • "none"
        • "norm"
        • default: "norm"
    • Response output:
      • dependent of outputFormat and outputType: lemma entries in Xml or Html or string format
  • Url: /mpiwg-mpdl-lt-web/lt/GetForms
    • Request parameters:
      • query (required)
        • one lemma (e.g. "revolution") or
        • blank separated list of forms (e.g. "revolution equality brotherliness")
      • language (optional)
        • ISO 639-3 specifier
        • default: "eng"
      • outputType (optional)
        • "compact"
        • "full"
        • default: "compact"
      • outputFormat (optional)
        • "html"
        • "xml"
        • "string" (lemma names separated by a blank)
        • default: "xml"
      • normalization (optional)
        • "none"
        • "norm"
        • default: "norm"
    • Response output:
      • dependent of outputFormat and outputType: form entries in Xml or Html or string format

Dictionary

  • Url: /mpiwg-mpdl-lt-web/lt/GetDictionaryEntries
    • Request parameters
      • query (required)
        • by one form or lemma (e.g. "revolution")
        • by a list of forms or lemmas (e.g. "revolution equality brotherliness")
        • by a prefix range: entries starting with a prefix (e.g. "a*")
      • inputType (optional)
        • "form"
        • "lemma"
        • default: "form"
      • language (optional)
        • ISO 639-3 specifier
        • default: "eng"
      • dictionary (optional)
        • dictionary name, e.g. "webster"
        • default: "all" (all dictionaries for the specified language)
      • outputFormat (optional)
        • "html"
        • "xml"
        • default: "xml"
      • outputType (optional)
        • this parameter can occur many times (e.g. "outputType=morphCompact&outputType=dictCompact")
          • "morphCompact"
          • "dictCompact"
          • "wikiCompact"
          • "allCompact" (all output types compact)
          • "morphFull"
          • "dictFull"
          • "wikiFull"
          • "allFull" (all output types full)
          • default: "allCompact"
      • normalization (optional)
        • "none"
        • "norm"
        • default: "norm"
      • resultPage (optional)
        • works only for range queries
        • page number of the result (e.g. "2": entries from position 51 to 100)
        • default: "1"
      • resultPageSize (optional)
        • works only for range queries
        • page size of the result (e.g. "100": each result page has a size of 100)
        • default: "50"
    • Response output
      • dependent of outputFormat and outputType: morphology, dictionary and Wikipedia entries in Xml or Html format

Text functions

  • Url: /mpiwg-mpdl-lt-web/text/Tokenize
    • Request parameters:
      • inputString or srcUrl (required)
        • inputString
          • string which should be tokenized
            • unstructured text
            • XML fragment/document
        • srcUrl
          • source URL
            • unstructured text
            • XML fragment/document
      • language (optional)
        • ISO 639-3 specifier
        • if input is XML and an element contains the attribute "xml:lang" this value is used for this element
        • default: "eng"
      • normalization (optional)
        • "none"
        • "norm"
        • default: "norm"
      • dictionary (optional)
        • "yes"
        • "no"
        • default: "yes"
      • stopElements (optional, default: empty)
        • list of xml element names which should not be tokenized (e.g. "lb pb")
        • default: empty list
      • outputFormat (optional)
        • "xml"
        • "string"
        • default: "xml"
    • Response output:
      • outputFormat=xml
        • tokenized inputString or document (enriched by element <w>)
          • e.g. <s><w lang="deu" form="dies" forms="dies, dieser, dieses, diesen" lemmas="dieser">Dies</w> <w lang="deu" form="ist" forms="bin, bist, ist, seid, sind, sein, war, warst, wart" lemmas="sein">ist</w> <w lang="deu" form="ein" forms="ein, eines, einer" lemmas="ein">ein</w> <w lang="deu" form="satz" forms="satz, sätze, satzes" lemmas="satz">Satz</w></s>
      • outputFormat=string
        • word tokens of inputString or document (separated by Blank)
  • Url: /mpiwg-mpdl-lt-web/text/Normalize
    • Request parameters
      • srcUrl or inputString (required)
        • source URL of XML document or string
      • language (optional)
        • ISO 639-3 specifier
        • default: "eng"
      • type (optional)
        • "dictionary"
        • "display"
        • "search"
        • default: "display"
    • Response output
      • normalized string or XML document
  • Url: /mpiwg-mpdl-lt-web/text/Transcode
    • Request parameters
      • inputString (required)
        • string which should be transcoded
      • srcEncoding (required)
        • "betacode"
        • "buckwalter"
        • "unicode"
      • destEncoding (optional)
        • "betacode"
        • "buckwalter"
        • "unicode"
        • default: "unicode"
    • Response output
      • transcoded string

CMS technology

The CMS technology web archive ""mpiwg-mpdl-cms.war"" consists of:

  • Basic Java code: mpiwg-mpdl-cms.jar
    • de.mpg.mpiwg.berlin.mpdl.cms.*
  • Web Java code: mpiwg-mpdl-lt-web.jar
    • de.mpg.mpiwg.berlin.mpdl.servlets.cms.*
  • external Java libraries
    • mpiwg-mpdl-xml.jar
    • mpiwg-mpdl-lt.jar
    • ...
  • web application configuration file: web.xml

Following HTTP servlets are available:

  • Url: /mpiwg-mpdl-cms-web/cms/GetDocument
    • Request parameters:
      • docId
        • document identifier (e.g. "/echo/la/Benedetti_1585.xml")
    • Response output:
      • document
  • Url: /mpiwg-mpdl-cms-web/cms/DocumentOperation
    • Request parameters:
      • operation
        • "create", "update", "delete"
      • srcUrl
      • destFileName (optional: default: file name in URL)
        • destination file name, e.g. "benedetti_1585.xml"
      • destLanguage (optional, default: xml:lang in document or "eng")
        • destination language (ISO 639-3 specifier), e.g. "lat"
    • Response output:
      • job id of scheduled operation
  • Url: /mpiwg-mpdl-cms-web/cms/Query
    • Request parameters:
      • queryType (optional: default: "morphological normalized")
        • "exact", "morphological", "normalized"
      • query
        • attribute query (e.g. "author = 'Benedetti' and language = 'lat'")
        • fulltext query (e.g. "quantitas")
      • docbases (optional, default: all document bases)
        • document bases (e.g. "mpdl", "archimedes-project")
      • orderBy (optional)
        • order query result by fieldname: e.g. "author" or "score" (fulltext queries)
      • resultPageNumber (optional, default: 1)
        • query result hits: page number
      • resultPageSize (optional, default: 100)
        • query result hits: page size
    • Response output:
      • query result (XML format)
  • Url: /mpiwg-mpdl-cms-web/cms/QueryDocument
    • Request parameters:
      • docId
        • document identifier (e.g. "/echo/la/benedetti_1585.xml")
      • query
        • fulltext query (e.g. "quantitas")
        • morphological fulltext query (e.g. "quantitas")
      • resultPageNumber (optional, default: 1)
        • query result hits: page number
      • resultPageSize (optional, default: 100)
        • query result hits: page size
    • Response output:
      • query result (XML format)
Last modified 6 years ago Last modified on Dec 14, 2011 1:41:08 PM