|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectAnalyzer
com.basistech.rlp.lucene.RLPAnalyzer
com.basistech.rlp.lucene.RLPJaAnalyzer
public final class RLPJaAnalyzer
An Analyzer for Japanese that uses RLP.
To use this analyzer, you must have a valid RLP license that enables the JLA language processor.
The default RLP context definition also requires RCLU to be licensed. RCLU is used for
Form KC Normalization and lowercasing.
This Analyzer uses RLPTokenizer
, LowerCaseFilter
,
and RLPPOSFilter
(only if POS generation is turned on and the allowed POS tag list is provided.
Note: Although this is currently implemented as a subclass of RLPAnalyzer
,
this is regarded as an implementation detail, and it may change in the future.
The eventual contract is that it is a subclass of Lucene Analyzer
.
Constructor Summary | |
---|---|
RLPJaAnalyzer()
This default constructor uses the default RLP Context which includes RCLU and JLA LPs. |
|
RLPJaAnalyzer(String rlpContextDef)
This constructor uses default set of the post types, which are STEM (which is actually a lemma) COMP (compound word decomposition) POS (part-of-speech in Token's payload field) Note that the many-to-one normalizer (used to be called JON) output and readings are not used. |
|
RLPJaAnalyzer(String rlpContextDef,
EnumSet<RLPTokenizer.PostType> postTypes)
This constructor uses the part-of-speech filter with the default part-of-speech tag set. |
|
RLPJaAnalyzer(String rlpContextDef,
EnumSet<RLPTokenizer.PostType> postTypes,
String[] allowedPOSTags)
This constructor does not use default values. |
Method Summary | |
---|---|
static String[] |
getDefaultAllowedPOSTags()
Gets the array of part-of-speech (POS) tags that is assumed when constructor without such argument is used. |
static String |
getDefaultContextDefinition()
Gets the context definition that is assumed when a constructor without such argument is used. |
static EnumSet<RLPTokenizer.PostType> |
getDefaultPostTypes()
Gets the set of post types that is assumed when a constructor without such argument is used. |
static void |
main(String[] args)
(Internal use only) Tokenizes a Japanese sentence, and displays the results. |
Methods inherited from class com.basistech.rlp.lucene.RLPAnalyzer |
---|
getDetectedLanguage, tokenStream |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RLPJaAnalyzer(String rlpContextDef, EnumSet<RLPTokenizer.PostType> postTypes, String[] allowedPOSTags)
rlpContextDef
- Context definition that RLP uses to process text: an XML string or path to XML file.postTypes
- RLP Result types for which the tokenizer will generate tokens.allowedPOSTags
- POSTagFilter will accept tokens with these POS tags.RLPAnalyzer.RLPAnalyzer(LanguageCode, String, EnumSet, String[])
public RLPJaAnalyzer(String rlpContextDef, EnumSet<RLPTokenizer.PostType> postTypes)
rlpContextDef
- Context definition that RLP uses to process text: an XML string or path to XML file.postTypes
- RLP Result types for which the tokenizer will generate tokens.RLPAnalyzer.RLPAnalyzer(LanguageCode, String, EnumSet)
public RLPJaAnalyzer(String rlpContextDef)
rlpContextDef
- Context definition that RLP uses to process text: an XML string or path to XML file.RLPAnalyzer.RLPAnalyzer(LanguageCode, String)
public RLPJaAnalyzer()
RLPAnalyzer.RLPAnalyzer(LanguageCode)
Method Detail |
---|
public static String getDefaultContextDefinition()
public static String[] getDefaultAllowedPOSTags()
public static EnumSet<RLPTokenizer.PostType> getDefaultPostTypes()
public static void main(String[] args)
args
- A Japanese sentence (arg[0]). If you do not include an arg, a default sentence
is processed.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |