Version 3 (modified by 16 years ago) (diff) | ,
---|
Tools for creating lucene indices
This projects is aimed to develop tools creating fulltext indices for documents ocred with Tesseract and Octropus. This tools cover the following features:
- Integrating Donatus Language Technologies for creating and searching a Lucene Index
- Indexing ocropus generated documents, so the hits can be displaxed on the original image using digilib.
Zope and Python tools to access theses indices can be found at https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/luceneToolsPython
Examples for the use of the Zope product can be found at http://itgroup.mpiwg-berlin.mpg.de/experimental/Searching/OCRfulltext.