This package contains a Tokenizer, Analyzers, and other classes for the integration of RLP with Lucene and Solr.

{@link com.basistech.rlp.lucene.RLPTokenizer} is the core of this package. It applies the resut of RLP language analysis to Lucene, with a flexible API.

The package provides a language-neutral Analyzer called RLPAnalyzer and some language-specific Analyzer classes that use RLPTokenizer and RLPPOSFilter (optional). Customers who write to the Lucene API (rather than Solr) can use one of these Analyzer classes, or write their own Analyzer using the provided source code as reference.

These Analyzers may not work very well when used for QueryParser, because the query terms are usually too short to analyze accurately; in particular, the part-of-speech tags may be incorrect. You might want to disable the part-of-speech filtering by specifying EnumSet<PostType> that do not have POS. You might even want to use {@link org.apache.lucene.analysis.WhitespaceAnalyzer}, skipping RLP altogether if your users usually enter words in their dictionary form separated by spaces.

Note about stop-word filitering: RLPAnalyzer and RLPXxAnalyzer do not use the Lucene standard {@link org.apache.lucene.analysis.StopFilter}, in favor of RLPPOSFilter. If word-based filtering is desired, use RLP's StopWord language processor (LP) by specifying an RLP Context Definition that includes StopWord LP. For Chinese, Japanese and Korean, use the stopword removal option of CLA/JLA/KLA LP instead of StopWord LP.

Note for Nutch Developer: These Analyzers do not work at all with Nutch for query analysis for Chinese and Japanese characters. This is because the NutchAnalysis class turns each Chinese and Japanese characters into a separate token. You would need to modify the source code of NutchAnalysis, NutchAnalysis.java.

{@link com.basistech.rlp.lucene.RLPPOSFilter} is a TokenFilter that removes Tokens based on the part-of-speech tags that RLPTokenizer attaches. {@link com.basistech.rlp.lucene.RLPAnalyzerDispatcher} is provided primarily for the Lucene demo application, and is not meant to be used for other purposes.