Context Navigation

Changes between Version 18 and Version 19 of mpdl2.0-design

Timestamp:: Oct 27, 2011, 8:55:47 AM (13 years ago)
Author:: jwillenborg
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

mpdl2.0-design

-                      v18
+                      v19
 === Morphology ===
-* Url: /mpiwg-mpdl-lt-web/lt/Tokenize
-  * Request parameters:
-    * srcUrl
-      * source URL of fulltext
-        * unstructured text
-        * XML fragment/document
-    * language (if available use xml:lang in XML document else this language value)
-      * ISO 639-3 specifier
-    * normalization (optional; default: without normalization)
-      * "reg", "norm", "reg norm"
-    * dictionary (optional; default: with dictionary)
-      * "yes", "no"
-    * stopElements (optional, default: empty)
-      * elements which should not be analyzed and enriched (e.g. "lb")
-    * outputFormat (optional; default: "xml")
-      * "xml", "string"
-  * Response output:
-    * xml
-      * document enriched by element <w>
-        * e.g. <s><w lang="deu" reg="dies" norm="dies" forms="dies, dieser, dieses, diesen" lemmas="dieser" dictionaries="dwds">Dies</w> <w lang="deu" reg="ist" norm="ist" forms="bin, bist, ist, seid, sind, sein, war, warst, wart" lemmas="sein" dictionaries="dwds">ist</w> <w lang="deu" reg="ein" norm="ein" forms="ein, eines, einer" lemmas="ein" dictionaries="dwds">ein</w> <w lang="deu" reg="satz" norm="satz" forms="satz, sätze, satzes" lemmas="satz" dictionaries="dwds">Satz</w></s>
-    * wordList
-      * word tokens (separated by Blank)
-      * word tokens (XML)
 * Url: /mpiwg-mpdl-lt-web/lt/GetLemmas
   * Request parameters:
 …
 === Text functions ===
+* Url: /mpiwg-mpdl-lt-web/text/Tokenize
+  * Request parameters:
+    * inputString or srcUrl (required)
+      * inputString
+        * string which should be tokenized
+          * unstructured text
+          * XML fragment/document
+      * srcUrl
+        * source URL
+          * unstructured text
+          * XML fragment/document
+    * language (optional)
+      * ISO 639-3 specifier
+      * if input is XML and an element contains the attribute "xml:lang" this value is used for this element
+      * default: "eng"
+    * normalization (optional)
+      * "none"
+      * "norm"
+      * default: "norm"
+    * dictionary (optional)
+      * "yes"
+      * "no"
+      * default: "yes"
+    * stopElements (optional, default: empty)
+      * list of xml element names which should not be tokenized (e.g. "lb pb")
+      * default: empty list
+    * outputFormat (optional)
+      * "xml"
+      * "string"
+      * default: "xml"
+  * Response output:
+    * outputFormat=xml
+      * tokenized inputString or document (enriched by element <w>)
+        * e.g. <s><w lang="deu" reg="dies" norm="dies" forms="dies, dieser, dieses, diesen" lemmas="dieser" dictionaries="dwds">Dies</w> <w lang="deu" reg="ist" norm="ist" forms="bin, bist, ist, seid, sind, sein, war, warst, wart" lemmas="sein" dictionaries="dwds">ist</w> <w lang="deu" reg="ein" norm="ein" forms="ein, eines, einer" lemmas="ein" dictionaries="dwds">ein</w> <w lang="deu" reg="satz" norm="satz" forms="satz, sätze, satzes" lemmas="satz" dictionaries="dwds">Satz</w></s>
+    * outputFormat=string
+      * word tokens of inputString or document (separated by Blank)
 * Url: /mpiwg-mpdl-lt-web/text/Normalize
   * Request parameters