Changes between Version 18 and Version 19 of mpdl2.0-design


Ignore:
Timestamp:
Oct 27, 2011, 8:55:47 AM (13 years ago)
Author:
jwillenborg
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • mpdl2.0-design

    v18 v19  
    116116=== Morphology ===
    117117
    118 * Url: /mpiwg-mpdl-lt-web/lt/Tokenize
    119   * Request parameters:
    120     * srcUrl
    121       * source URL of fulltext
    122         * unstructured text
    123         * XML fragment/document
    124     * language (if available use xml:lang in XML document else this language value)
    125       * ISO 639-3 specifier
    126     * normalization (optional; default: without normalization)
    127       * "reg", "norm", "reg norm"
    128     * dictionary (optional; default: with dictionary)
    129       * "yes", "no"
    130     * stopElements (optional, default: empty)
    131       * elements which should not be analyzed and enriched (e.g. "lb")
    132     * outputFormat (optional; default: "xml")
    133       * "xml", "string"
    134   * Response output:
    135     * xml
    136       * document enriched by element <w>
    137         * e.g. <s><w lang="deu" reg="dies" norm="dies" forms="dies, dieser, dieses, diesen" lemmas="dieser" dictionaries="dwds">Dies</w> <w lang="deu" reg="ist" norm="ist" forms="bin, bist, ist, seid, sind, sein, war, warst, wart" lemmas="sein" dictionaries="dwds">ist</w> <w lang="deu" reg="ein" norm="ein" forms="ein, eines, einer" lemmas="ein" dictionaries="dwds">ein</w> <w lang="deu" reg="satz" norm="satz" forms="satz, sätze, satzes" lemmas="satz" dictionaries="dwds">Satz</w></s>
    138     * wordList
    139       * word tokens (separated by Blank)
    140       * word tokens (XML)
    141 
    142118* Url: /mpiwg-mpdl-lt-web/lt/GetLemmas
    143119  * Request parameters:
     
    229205=== Text functions ===
    230206
     207* Url: /mpiwg-mpdl-lt-web/text/Tokenize
     208  * Request parameters:
     209    * inputString or srcUrl (required)
     210      * inputString
     211        * string which should be tokenized
     212          * unstructured text
     213          * XML fragment/document
     214      * srcUrl
     215        * source URL
     216          * unstructured text
     217          * XML fragment/document
     218    * language (optional)
     219      * ISO 639-3 specifier
     220      * if input is XML and an element contains the attribute "xml:lang" this value is used for this element
     221      * default: "eng"
     222    * normalization (optional)
     223      * "none"
     224      * "norm"
     225      * default: "norm"
     226    * dictionary (optional)
     227      * "yes"
     228      * "no"
     229      * default: "yes"
     230    * stopElements (optional, default: empty)
     231      * list of xml element names which should not be tokenized (e.g. "lb pb")
     232      * default: empty list
     233    * outputFormat (optional)
     234      * "xml"
     235      * "string"
     236      * default: "xml"
     237  * Response output:
     238    * outputFormat=xml
     239      * tokenized inputString or document (enriched by element <w>)
     240        * e.g. <s><w lang="deu" reg="dies" norm="dies" forms="dies, dieser, dieses, diesen" lemmas="dieser" dictionaries="dwds">Dies</w> <w lang="deu" reg="ist" norm="ist" forms="bin, bist, ist, seid, sind, sein, war, warst, wart" lemmas="sein" dictionaries="dwds">ist</w> <w lang="deu" reg="ein" norm="ein" forms="ein, eines, einer" lemmas="ein" dictionaries="dwds">ein</w> <w lang="deu" reg="satz" norm="satz" forms="satz, sätze, satzes" lemmas="satz" dictionaries="dwds">Satz</w></s>
     241    * outputFormat=string
     242      * word tokens of inputString or document (separated by Blank)
     243
    231244* Url: /mpiwg-mpdl-lt-web/text/Normalize
    232245  * Request parameters