|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectAnalyzer
com.basistech.rlp.lucene.RLPAnalyzer
public class RLPAnalyzer
A generic (language-neutral) Analyzer that uses RLPTokenizer.
This Analyzer uses RLPTokenizer
, LowerCaseFilter
,
and RLPPOSFilter
(only if POS generation is turned on and the allowed POS tag list is provided).
Constructor Summary | |
---|---|
RLPAnalyzer()
This is equivalent of RLPAnalyzer( LanguageCode#UNKNOWN ). |
|
RLPAnalyzer(LanguageCode lang)
Same as the two-parameter constructor except that an RLP Context that includes RCLU and the base linguistic LPs for all supported languages will be used. |
|
RLPAnalyzer(LanguageCode lang,
String rlpContextDef)
Same as the three-parameter constructor except that Tokens derived from all the available result types will be generated without POS tags in their payload. |
|
RLPAnalyzer(LanguageCode lang,
String rlpContextDef,
EnumSet<RLPTokenizer.PostType> postTypes)
When this constructor is used, no POS filter will be used. |
|
RLPAnalyzer(LanguageCode lang,
String rlpContextDef,
EnumSet<RLPTokenizer.PostType> postTypes,
String[] allowedPOSTags)
The most flexible constructor takes 4 arguments. |
Method Summary | |
---|---|
static String |
getDefaultContextDefinition()
Gets the RLP Context Definition string that is assumed when a constructor that doesn't take one is used. |
static EnumSet<RLPTokenizer.PostType> |
getDefaultPostTypes()
Gets the set of post types that is assumed when a constructor that doesn't take a post types argument is used. |
LanguageCode |
getDetectedLanguage()
Returns the language detected by RLI, if it is enabled. |
TokenStream |
tokenStream(String fieldName,
Reader reader)
An implementation of Analyzer#tokenStream(String, Reader) . |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RLPAnalyzer(LanguageCode lang, String rlpContextDef, EnumSet<RLPTokenizer.PostType> postTypes, String[] allowedPOSTags)
lang
- The language of the text. If it is LanguageCode#UNKNOWN
,
then RLI will be used to auto-detect the language. To use RLI, rlpContextDef must include RLI,
and the RLI feature must be licensed.rlpContextDef
- Either the file path or the definition itself in XML form.postTypes
- Essentially a bit vector that tells what RLP result types should generate tokens.
Specify EnumSet.allOf(PostType.class) to enable all possible token types with part-of-speech tags.allowedPOSTags
- List of part-of-speech tags that should not be filtered out; null means no
POS filtering. Note part-of-speech tags are language dependent. This argument will be ignored and
no POS filter will be used if postTypes does not include POS.public RLPAnalyzer(LanguageCode lang, String rlpContextDef, EnumSet<RLPTokenizer.PostType> postTypes)
lang
- The language of the text. If it is LanguageCode#UNKNOWN
, then RLI will be used to auto-detect the language. To use RLI, rlpContextDef must include RLI, and the RLI feature must be licensed.rlpContextDef
- Either the file path or the definition itself in XML form.postTypes
- Essentially a bit vector that tells what RLP result types should generate tokens. Specify EnumSet.allOf(PostType.class) to enable every possible token types with part-of-speech tags.public RLPAnalyzer(LanguageCode lang, String rlpContextDef)
lang
- The language of the text. If it is LanguageCode#UNKNOWN
, then RLI will
be used to auto-detect the language. To use RLI, rlpContextDef must include RLI, and the
RLI feature must be licensed.rlpContextDef
- Either the file path or the definition itself in XML form.public RLPAnalyzer(LanguageCode lang)
lang
- The language of the text. If it is LanguageCode#UNKNOWN
, then RLI will be used
to auto-detect the language. To use RLI, rlpContextDef must include RLI, and the RLI feature
must be licensed.public RLPAnalyzer()
LanguageCode#UNKNOWN
). RLI feature must be licensed.
Method Detail |
---|
public static String getDefaultContextDefinition()
public static EnumSet<RLPTokenizer.PostType> getDefaultPostTypes()
public TokenStream tokenStream(String fieldName, Reader reader)
Analyzer#tokenStream(String, Reader)
.
fieldName
- Not used.reader
- Input reader.
public LanguageCode getDetectedLanguage()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |