wiki:RLP-Test

Version 13 (modified by jwillenborg, 15 years ago) (diff)

--

Test of RLP (Rosette Linguistics Platform)

RLP

  • RLP: version 6.5.2 (platform dependent)
  • RLP-Lucene: version 6.0.0 (Java library: platform independent)

Hardware, operating system

  • Mac Pro, Dual Core Intel Xeon 2,66 Ghz, 4GB RAM
  • MacOS 10.5.4

Indexing

  • done on eXist with Lucene (eXist 1.3dev)
  • document base: Archimedes and Echo: 113 documents, sized between 1 KB and 18 MB, languages: latin, italian, english, german, french, dutch, greek, arabic, chinese
  • needed 1,3 hours (83 minutes)
  • took most of the time full processor time (100%)
  • RAM consumption ok (< 1000 MB)

Result / Quality of indexing (random samples)

  • application: see MPDL prototype with RLP analyzer (access only within MPIWG network)
  • online example: RLP base form reduction (morphological index lookup in a document) for "a" in Delfino, Federico. De fluxu et refluxu aquae maris. Venice, 1559
  • base form reduction: comparison of RLP and Donatus
    • latin: Delfino, Federico. De fluxu et refluxu aquae maris. Venice, 1559. Morphological index for "a"
      • RLP: 234 base forms
      • Donatus: 149 base forms
      • RLP misses: 36%
    • italian: Borro, Girolamo. Del flusso e reflusso del mare. Lucca, 1561. Morphological index for "e"
      • RLP: 221 base forms
      • Donatus: 132 base forms
      • RLP misses: 40%
    • english: Alberti, Leone Battista. Architecture. London, 1755. Morphological index for "b"
      • RLP: 592 base forms
      • Donatus: 367 base forms
      • RLP misses: 38%
    • german: Johann Grunert. Mathematik und Physik. 1920. Morphological index for "f"
      • RLP: 25 base forms
      • Donatus: 16 base forms
      • RLP misses: 36%
    • french: Galilei, Galileo. Les méchaniques. Paris, 1634. Morphological index for "g"
      • RLP: 71 base forms
      • Donatus: 60 base forms
      • RLP misses: 15%
    • dutch: Stevin, Simon. De Beghinselen der Weegconst. Leyden, 1586. Morphological index for "d"
      • RLP: 159 base forms
      • Donatus: 142 base forms
      • RLP misses: 11%
    • greek: Epicurus. Varia. Leipzig, 1887. Morphological index for "s"
      • RLP: 253 base forms
      • Donatus: 241 base forms
      • RLP misses: 5%
    • arabic: Heron Alexandrinus. Mechanica. Leipzig, 1900. Morphological index for "a"
      • RLP: 330 base forms
      • Donatus: 325 base forms
      • RLP misses: 2%
    • chinese: no base form reduction
    • overall: RLP misses xx % in base form reduction in contrast to Donatus
  • base form reduction of latin "sunt": comparison of RLP and Donatus (in Benedetti, Giovanni Battista de. Diversarum Speculationum mathematicum, & physicarum liber. 1585.)
    • RLP: 259 sentence hits
    • Donatus: 1655 sentence hits (with all forms: ens, entibus, entis, eram, eramus, erant, erantque, erat, eratque, erimus, eris, erit, eritin, eritque, eritqueue, ero, erunt, erunt., eruntque, es, esne, esse, essemus, essent, esseque, esset, est, estis, esto, estque, fore, forem, forent, fores, foret, fuam, fuat, fueram, fueramus, fuerant, fueras, fuerat, fuere, fuerim, fuerimus, fuerin, fuerint, fuerintque, fueris, fuerit, fueritne, fueritque, fuero, fuerunt, fui, fuimus, fuisse, fuissent, fuisset, fuit, fuitque, futura, futuram, futurarum, futuras, futuri, futuris, futuro, futurorum, futuros, futurum, futurumst, futurus, sient, siet, sim, simus, sint, sintque, sis, sit, sitis, sitque, sum, sumus, sunt, sunto, suntque)
    • RLP misses: 84%
  • double entries: same word forms leads to different base forms: examples
    • babylonian, babylonians
    • back­doors, back­-doors, back­-door
    • fleisse, fleissigen, fleiß, fleißig
  • orthographic normalization: error base forms (examples)
    • f., fisi-, e@@et, e@t,
    • c.a, c.b, c.d, c.e, c.f, ..., c.sit, ..., c.y, d.c.sit, d-ui, e-tago, fa-cere, face-re
    • ca-liditatem, ca-lor, ca-lorem, ...
    • single characters: a, b, c, ...
  • count hits: errors (examples)
    • fotografie: 10 hits (actually 5 hits)
    • 編 : 1 hit (actually 15 hits)
  • overall
    • better indexing time than Online-Donatus (60% faster)
    • RLP produces many errors (much more errors as Donatus)
    • it is not platform independent
    • is not open software
    • it costs much money
    • therefore: RLP will not be used