wiki:RLP-Test

Version 10 (modified by jwillenborg, 15 years ago) (diff)

--

Test of RLP (Rosette Linguistics Platform)

RLP

  • RLP: version 6.5.2 (platform dependant)
  • RLP-Lucene: version 6.0.0 (Java library: platform independant)

Document base

  • 113 documents, sized each 1 KB - 18 MB
  • languages: latin, italian, english, german, french, dutch, greek, arabic, chinese

Hardware, operating system

  • Mac Pro, Dual Core Intel Xeon 2,66 Ghz, 4GB RAM
  • MacOS 10.5.4

Indexing

Result / Quality of indexing (random samples)

  • application: see MPDL prototype with RLP analyzer (access only within MPIWG network)
  • online example: RLP base form reduction (morphological index lookup in a document) for "a" in Delfino, Federico. De fluxu et refluxu aquae maris. Venice, 1559
  • base form reduction: comparison of RLP and Donatus
    • latin: Delfino, Federico. De fluxu et refluxu aquae maris. Venice, 1559. Morphological index for "a"
      • RLP: 234 base forms
      • Donatus: 149 base forms
    • italian: Borro, Girolamo. Del flusso e reflusso del mare. Lucca, 1561. Morphological index for "e"
      • RLP: 221 base forms
      • Donatus: 132 base forms
    • english: Alberti, Leone Battista. Architecture. London, 1755. Morphological index for "b"
      • RLP: 592 base forms
      • Donatus: 367 base forms
    • german: Johann Grunert. Mathematik und Physik. 1920. Morphological index for "f"
      • RLP: 25 base forms
      • Donatus: 16 base forms
    • french: Alberti, Leone Battista. Architecture. London, 1755. Morphological index for "b"
      • RLP: 592 base forms
      • Donatus: 367 base forms
    • dutch: Stevin, Simon. De Beghinselen der Weegconst. Leyden, 1586. Morphological index for "d"
      • RLP: 159 base forms
      • Donatus: 142 base forms
    • overall: RLP misses xx % in base form reduction in contrast to Donatus
  • double entries: same word forms leads to different base forms: examples
    • babylonian, babylonians
    • back­doors, back­-doors, back­-door
    • fleisse, fleissigen, fleiß, fleißig
  • orthographic normalization: error examples
    • f., fisi-,
    • single characters: a, b,
  • count hits: error examples
    • fotografie: 10 hits (actually 5 hits)