= Test of RLP (Rosette Linguistics Platform) = == RLP == * RLP: version 6.5.2 (platform dependent) * RLP-Lucene: version 6.0.0 (Java library: platform independent) == Hardware, operating system == * Mac Pro, Dual Core Intel Xeon 2,66 Ghz, 4GB RAM * MacOS 10.5.4 == Indexing == * done on eXist with Lucene (eXist 1.3dev) * document base: Archimedes and Echo: 113 documents, sized between 1 KB and 18 MB, languages: latin, italian, english, german, french, dutch, greek, arabic, chinese * needed 1,3 hours (83 minutes) * took most of the time full processor time (100%) * RAM consumption ok (< 1000 MB) == Result / Quality of indexing (random samples) == * indexing result: see morphological index on: [http://xserve07.mpiwg-berlin.mpg.de:30010/mpdl/query.xql MPDL prototype with RLP analyzer] (access only within MPIWG network) * example: morphological index lookup in: [http://xserve07.mpiwg-berlin.mpg.de:30010/mpdl/page-query-result.xql?document=/archimedes/la/delfi_fluxu_024_la_1559.xml&pn=1&mode=text&query-type=ftIndexMorph&query=a&query-result-pn=1 for "a" in Delfino, Federico. De fluxu et refluxu aquae maris. Venice, 1559] * base form reduction: comparison of RLP and Donatus * latin: Delfino, Federico. De fluxu et refluxu aquae maris. Venice, 1559. Morphological index for "a" * RLP: 234 base forms * Donatus: 149 base forms * RLP misses: 36% * italian: Borro, Girolamo. Del flusso e reflusso del mare. Lucca, 1561. Morphological index for "e" * RLP: 221 base forms * Donatus: 132 base forms * RLP misses: 40% * english: Alberti, Leone Battista. Architecture. London, 1755. Morphological index for "b" * RLP: 592 base forms * Donatus: 367 base forms * RLP misses: 38% * german: Johann Grunert. Mathematik und Physik. 1920. Morphological index for "f" * RLP: 25 base forms * Donatus: 16 base forms * RLP misses: 36% * french: Galilei, Galileo. Les méchaniques. Paris, 1634. Morphological index for "g" * RLP: 71 base forms * Donatus: 60 base forms * RLP misses: 15% * dutch: Stevin, Simon. De Beghinselen der Weegconst. Leyden, 1586. Morphological index for "d" * RLP: 159 base forms * Donatus: 142 base forms * RLP misses: 11% * greek: Epicurus. Varia. Leipzig, 1887. Morphological index for "s" * RLP: 253 base forms * Donatus: 241 base forms * RLP misses: 5% * arabic: Heron Alexandrinus. Mechanica. Leipzig, 1900. Morphological index for "a" * RLP: 330 base forms * Donatus: 325 base forms * RLP misses: 2% * chinese: no base form reduction * overall: RLP misses xx % in base form reduction in contrast to Donatus * base form reduction of latin "sunt": comparison of RLP and Donatus (in Benedetti, Giovanni Battista de. Diversarum Speculationum mathematicum, & physicarum liber. 1585.) * RLP: 259 sentence hits * Donatus: 1655 sentence hits (with all forms: ens, entibus, entis, eram, eramus, erant, erantque, erat, eratque, erimus, eris, erit, eritin, eritque, eritqueue, ero, erunt, erunt., eruntque, es, esne, esse, essemus, essent, esseque, esset, est, estis, esto, estque, fore, forem, forent, fores, foret, fuam, fuat, fueram, fueramus, fuerant, fueras, fuerat, fuere, fuerim, fuerimus, fuerin, fuerint, fuerintque, fueris, fuerit, fueritne, fueritque, fuero, fuerunt, fui, fuimus, fuisse, fuissent, fuisset, fuit, fuitque, futura, futuram, futurarum, futuras, futuri, futuris, futuro, futurorum, futuros, futurum, futurumst, futurus, sient, siet, sim, simus, sint, sintque, sis, sit, sitis, sitque, sum, sumus, sunt, sunto, suntque) * RLP misses: 84% * double entries: same word forms leads to different base forms: examples * babylonian, babylonians * back­doors, back­-doors, back­-door * fleisse, fleissigen, fleiß, fleißig * orthographic normalization: error base forms (examples) * f., fisi-, e@@et, e@t, * c.a, c.b, c.d, c.e, c.f, ..., c.sit, ..., c.y, d.c.sit, d-ui, e-tago, fa-cere, face-re * ca-liditatem, ca-lor, ca-lorem, ... * single characters: a, b, c, ... * count hits: errors (examples) * fotografie: 10 hits (actually 5 hits) * 編 : 1 hit (actually 15 hits) * overall * better indexing time than Online-Donatus (60% faster) * RLP produces many errors (much more errors as Donatus) * it is not platform independent * is not open software * it costs much money * therefore: RLP will not be used