Changes between Version 1 and Version 2 of mpdl2.0-design
- Timestamp:
- Sep 5, 2011, 2:53:42 PM (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
mpdl2.0-design
v1 v2 1 1 = MPDL 2.0 = 2 2 3 The MPDL 1.0 software is tightly coupled with the XML system eXist. The next MPDL release 2.0 will be redesigned so that many functions (language technology, some basic XML functions) are usable as open services independently of the eXist software. The new functions will be designed in a layer architecture so that they could be used in different workflows and in a more standardized way (API and XML standard output format). All main functions are available as servlets and arefully implemented in Java.3 The MPDL release 2.0 is redesigned so that all important functions (language technology, XML functions) are usable as web applications independent from the eXist software - available as HTTP servlets and fully implemented in Java. 4 4 5 5 == Language technology == 6 6 7 === Word recognition === 7 The language technology module consists of: 8 * language technology data (XML data files, Java Berkely DB's) 9 * morphology data (ara, eng, fre, ger, gre, ita, lat, nld, zho) 10 * dictionary data 11 * Java source code 12 * used Java libraries 13 * web application configuration file (web.xml) 8 14 9 Input: 10 * text (URL) 11 * unstructured text 12 * XML fragment/document 13 * language (ISO 639-3 specifier) 15 It is available as the web archive file "mpiwg-mpdl-lt.war". 14 16 15 Output 16 * list of word tokens 17 * words seperated by a blank 18 * XML format 17 Following servlets are available: 19 18 20 19 === Morphology === 21 20 21 * TokenizeServlet 22 * URL: /mpdl/tokenize 23 * Request parameters: 24 * srcUrl 25 * source URL of fulltext 26 * unstructured text 27 * XML fragment/document 28 * language 29 * ISO 639-3 specifier 30 * Response output: 31 * word tokens 32 * word tokens (XML) 33 34 * LemmaServlet 35 * URL: /mpdl/getLemmas 36 * Request parameters: 37 * forms 38 * one word form (string) 39 * list of word forms (XML) 40 * language 41 * ISO 639-3 specifier 42 * Response output: 43 * lemmas 44 * one lemma 45 * list of lemmas (XML) 46 47 * FormServlet 48 * URL: /mpdl/getForms 49 * Request parameters: 50 * lemmas 51 * one lemma (string) 52 * list of lemmas (XML) 53 * language 54 * ISO 639-3 specifier 55 * Response output: 56 * forms 57 * list forms (XML) 58 22 59 === Dictionary === 23 60 61 * WordServlet 62 * URL: /mpdl/getDictionaryEntries 63 * Request parameters: 64 * forms 65 * one form (string) 66 * list of forms (XML) 67 * language 68 * ISO 639-3 specifier 69 * type 70 * full, compact 71 * Response output: 72 * dictionary entries 73 * dictionary entries (XML) 24 74 25 == XML functions == 75 * DictionaryEnrichServlet 76 * URL: /mpdl/enrichByDictionary 77 * Request parameters: 78 * srcUrl 79 * source URL of XML fragment/document 80 * Response output: 81 * enriched XML fragment/document 82 * words of document are extended by links to dictionaries 83 84 === Other functions === 85 86 * NormalizeServlet 87 * URL: /mpdl/normalize 88 * Request parameters: 89 * srcUrl 90 * source URL of XML fragment/document 91 * method 92 * method of normalization (e.g. "reg", "norm", "reg norm") 93 * type 94 * type of normalization (e.g. "display", "dictionary", "search") 95 * Response output: 96 * normalized XML fragment/document 97 98 * TranscodeServlet 99 * URL: /mpdl/transcode 100 * Request parameters: 101 * text 102 * text to be transcoded (string) 103 * srcEncoding 104 * source encoding (e.g. betacode, buckwalter, unicode) 105 * destEncoding 106 * destination encoding (e.g. betacode, buckwalter, unicode) 107 * Response output: 108 * transcoded text 109 110 == XML technology == 111 112 The XML technology module consists of: 113 * Java source code 114 * used Java libraries 115 * web application configuration file (web.xml) 116 117 It is available as the web archive file "mpiwg-mpdl-xml.war". 118 119 Following servlets are available: 26 120 27 121 === XPath/XQuery === 28 122 29 === get fragment === 123 * TransformServlet 124 * URL: /mpdl/transform 125 * Request parameters: 126 * srcUrl 127 * source URL of XML document 128 * xslUrl 129 * URL of XSL document 130 * Response output: 131 * transformed document (HTML, XML, etc.) 30 132 133 * RenderServlet 134 * URL: /mpdl/render 135 * Request parameters: 136 * srcUrl 137 * source URL of XML document 138 * Response output: 139 * rendered document (PDF) 140 141 * XPathServlet 142 * URL: /mpdl/xpath 143 * Request parameters: 144 * srcUrl 145 * source URL of XML document 146 * xpath 147 * xpath source code 148 * Response output: 149 * XPath result for that document 150 151 * XQueryServlet 152 * URL: /mpdl/xquery 153 * Request parameters: 154 * srcUrl 155 * source URL of XML document 156 * xquery 157 * xquery source code 158 * Response output: 159 * XQuery result for that document 160 161 * GetFragmentServlet 162 * URL: /mpdl/getFragment 163 * Request parameters: 164 * srcUrl 165 * source URL of XML document 166 * ms1Name 167 * first milestone name, e.g. "pb" 168 * ms1Position 169 * first milestone position, e.g. 1 170 * ms2Name 171 * second milestone name, e.g. "pb" 172 * ms2Position 173 * second milestone position, e.g. 2 174 * Response output: 175 * XML fragment 176