Changes between Version 10 and Version 11 of RLP-Test


Ignore:
Timestamp:
Sep 16, 2009, 11:39:15 AM (15 years ago)
Author:
jwillenborg
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • RLP-Test

    v10 v11  
    22
    33== RLP ==
    4   * RLP: version 6.5.2 (platform dependant)
    5   * RLP-Lucene: version 6.0.0 (Java library: platform independant)
     4  * RLP: version 6.5.2 (platform dependent)
     5  * RLP-Lucene: version 6.0.0 (Java library: platform independent)
    66== Document base ==
    77  * 113 documents, sized each 1 KB - 18 MB
     
    2222      * RLP: 234 base forms
    2323      * Donatus: 149 base forms
     24      * RLP misses: 36%
    2425    * italian: Borro, Girolamo. Del flusso e reflusso del mare. Lucca, 1561. Morphological index for "e"
    2526      * RLP: 221 base forms
    2627      * Donatus: 132 base forms
     28      * RLP misses: 40%
    2729    * english: Alberti, Leone Battista. Architecture. London, 1755. Morphological index for "b"
    2830      * RLP: 592 base forms
    2931      * Donatus: 367 base forms
     32      * RLP misses: 38%
    3033    * german: Johann Grunert. Mathematik und Physik. 1920. Morphological index for "f"
    3134      * RLP: 25 base forms
    3235      * Donatus: 16 base forms
    33     * french: Alberti, Leone Battista. Architecture. London, 1755. Morphological index for "b"
    34       * RLP: 592 base forms
    35       * Donatus: 367 base forms
     36      * RLP misses: 36%
     37    * french: Galilei, Galileo. Les méchaniques. Paris, 1634. Morphological index for "g"
     38      * RLP: 71 base forms
     39      * Donatus: 60 base forms
     40      * RLP misses: 15%
    3641    * dutch: Stevin, Simon. De Beghinselen der Weegconst. Leyden, 1586. Morphological index for "d"
    3742      * RLP: 159 base forms
    3843      * Donatus: 142 base forms
     44      * RLP misses: 11%
     45    * greek: Epicurus. Varia. Leipzig, 1887. Morphological index for "s"
     46      * RLP: 253 base forms
     47      * Donatus: 241 base forms
     48      * RLP misses: 5%
     49    * arabic: Heron Alexandrinus. Mechanica. Leipzig, 1900. Morphological index for "a"
     50      * RLP: 330 base forms
     51      * Donatus: 325 base forms
     52      * RLP misses: 2%
     53    * chinese: no base form reduction
    3954    * overall: RLP misses xx % in base form reduction in contrast to Donatus
     55  * base form reduction of latin "sunt": comparison of RLP and Donatus (in Benedetti, Giovanni Battista de. Diversarum Speculationum mathematicum, & physicarum liber. 1585.)
     56    * RLP: 259 sentence hits
     57    * Donatus: 1655 sentence hits (with all forms: ens, entibus, entis, eram, eramus, erant, erantque, erat, eratque, erimus, eris, erit, eritin, eritque, eritqueue, ero, erunt, erunt., eruntque, es, esne, esse, essemus, essent, esseque, esset, est, estis, esto, estque, fore, forem, forent, fores, foret, fuam, fuat, fueram, fueramus, fuerant, fueras, fuerat, fuere, fuerim, fuerimus, fuerin, fuerint, fuerintque, fueris, fuerit, fueritne, fueritque, fuero, fuerunt, fui, fuimus, fuisse, fuissent, fuisset, fuit, fuitque, futura, futuram, futurarum, futuras, futuri, futuris, futuro, futurorum, futuros, futurum, futurumst, futurus, sient, siet, sim, simus, sint, sintque, sis, sit, sitis, sitque, sum, sumus, sunt, sunto, suntque)
     58    * RLP misses: 84%
    4059  * double entries: same word forms leads to different base forms: examples
    4160    * babylonian, babylonians
    4261    * back­doors, back­-doors, back­-door
    4362    * fleisse, fleissigen, fleiß, fleißig
    44   * orthographic normalization: error examples
    45     * f., fisi-,
    46     * single characters: a, b,
    47   * count hits: error examples
     63  * orthographic normalization: error base forms (examples)
     64    * f., fisi-, e@@et, e@t,
     65    * c.a, c.b, c.d, c.e, c.f, ..., c.sit, ..., c.y, d.c.sit, d-ui, e-tago, fa-cere, face-re
     66    * ca-liditatem, ca-lor, ca-lorem, ...
     67    * single characters: a, b, c, ...
     68  * count hits: errors (examples)
    4869    * fotografie: 10 hits (actually 5 hits)
     70    * 編 : 1 hit (actually 15 hits)
     71  * overall
     72    * RLP produces many errors (much more errors as Donatus)
     73    * it is not platform independent
     74    * is not open software
     75    * it costs much money