RLP
- RLP: version 6.5.2 (platform dependent)
- RLP-Lucene: version 6.0.0 (Java library: platform independent)
Hardware, operating system
- Mac Pro, Dual Core Intel Xeon 2,66 Ghz, 4GB RAM
- MacOS 10.5.4
Indexing
- done on eXist with Lucene (eXist 1.3dev)
- document base: Archimedes and Echo: 113 documents, sized between 1 KB and 18 MB, languages: latin, italian, english, german, french, dutch, greek, arabic, chinese
- needed 1,3 hours (83 minutes)
- took most of the time full processor time (100%)
- RAM consumption ok (< 1000 MB)
Result / Quality of indexing (random samples)
- indexing result: see morphological index on: MPDL prototype with RLP analyzer (access only within MPIWG network)
- online example: RLP base form reduction (morphological index lookup in a document) for "a" in Delfino, Federico. De fluxu et refluxu aquae maris. Venice, 1559
- base form reduction: comparison of RLP and Donatus
- latin: Delfino, Federico. De fluxu et refluxu aquae maris. Venice, 1559. Morphological index for "a"
- RLP: 234 base forms
- Donatus: 149 base forms
- RLP misses: 36%
- italian: Borro, Girolamo. Del flusso e reflusso del mare. Lucca, 1561. Morphological index for "e"
- RLP: 221 base forms
- Donatus: 132 base forms
- RLP misses: 40%
- english: Alberti, Leone Battista. Architecture. London, 1755. Morphological index for "b"
- RLP: 592 base forms
- Donatus: 367 base forms
- RLP misses: 38%
- german: Johann Grunert. Mathematik und Physik. 1920. Morphological index for "f"
- RLP: 25 base forms
- Donatus: 16 base forms
- RLP misses: 36%
- french: Galilei, Galileo. Les méchaniques. Paris, 1634. Morphological index for "g"
- RLP: 71 base forms
- Donatus: 60 base forms
- RLP misses: 15%
- dutch: Stevin, Simon. De Beghinselen der Weegconst. Leyden, 1586. Morphological index for "d"
- RLP: 159 base forms
- Donatus: 142 base forms
- RLP misses: 11%
- greek: Epicurus. Varia. Leipzig, 1887. Morphological index for "s"
- RLP: 253 base forms
- Donatus: 241 base forms
- RLP misses: 5%
- arabic: Heron Alexandrinus. Mechanica. Leipzig, 1900. Morphological index for "a"
- RLP: 330 base forms
- Donatus: 325 base forms
- RLP misses: 2%
- chinese: no base form reduction
- overall: RLP misses xx % in base form reduction in contrast to Donatus
- base form reduction of latin "sunt": comparison of RLP and Donatus (in Benedetti, Giovanni Battista de. Diversarum Speculationum mathematicum, & physicarum liber. 1585.)
- RLP: 259 sentence hits
- Donatus: 1655 sentence hits (with all forms: ens, entibus, entis, eram, eramus, erant, erantque, erat, eratque, erimus, eris, erit, eritin, eritque, eritqueue, ero, erunt, erunt., eruntque, es, esne, esse, essemus, essent, esseque, esset, est, estis, esto, estque, fore, forem, forent, fores, foret, fuam, fuat, fueram, fueramus, fuerant, fueras, fuerat, fuere, fuerim, fuerimus, fuerin, fuerint, fuerintque, fueris, fuerit, fueritne, fueritque, fuero, fuerunt, fui, fuimus, fuisse, fuissent, fuisset, fuit, fuitque, futura, futuram, futurarum, futuras, futuri, futuris, futuro, futurorum, futuros, futurum, futurumst, futurus, sient, siet, sim, simus, sint, sintque, sis, sit, sitis, sitque, sum, sumus, sunt, sunto, suntque)
- RLP misses: 84%
- double entries: same word forms leads to different base forms: examples
- babylonian, babylonians
- backdoors, back-doors, back-door
- fleisse, fleissigen, fleiß, fleißig
- orthographic normalization: error base forms (examples)
- f., fisi-, e@@et, e@t,
- c.a, c.b, c.d, c.e, c.f, ..., c.sit, ..., c.y, d.c.sit, d-ui, e-tago, fa-cere, face-re
- ca-liditatem, ca-lor, ca-lorem, ...
- single characters: a, b, c, ...
- count hits: errors (examples)
- fotografie: 10 hits (actually 5 hits)
- 編 : 1 hit (actually 15 hits)
- overall
- better indexing time than Online-Donatus (60% faster)
- RLP produces many errors (much more errors as Donatus)
- it is not platform independent
- is not open software
- it costs much money
- therefore: RLP will not be used
Download in other formats: