|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectBaseTokenizerFactory
com.basistech.rlp.solr.RLPTokenizerFactory
public class RLPTokenizerFactory
Solr tokenizer factory for RLPTokenizer. This class name should be specified in the <tokenizer> element in schema.xml. Following attributes will be recognized:
Constructor Summary | |
---|---|
RLPTokenizerFactory()
|
Method Summary | |
---|---|
TokenStream |
create(Reader input)
Creates a token stream using RLPTokenizer . |
LanguageCode |
getLanguage()
Retrieves the language code that has been set (LanguageCode.UNKNOWN if it hasn't been set). |
String |
getRLPContextDef()
Retrieves the path to the RLP XML context definition file or string. |
void |
inform(ResourceLoader loader)
An implementation of ResourceLoaderAware#inform(ResourceLoader) as required by the interface. |
boolean |
isPostCompoundComponents()
Determines whether component tokens are generated for each compound word (German, Dutch, Hungarian, Chinese, Japanese, and Korean). |
boolean |
isPostLemma()
Determines whether a token is generated for each lemma (Arabic only). |
boolean |
isPostM1NormalizedToken()
Determines whether a token is generated for each many-to-one normalized word (Japanese, and other languages). |
boolean |
isPostNormalizedToken()
Determines whether a token is generated for each normalized word (Arabic only). |
boolean |
isPostPartOfSpeech()
Determines whether a part-of-speech (POS) is stored in each Token's Payload field. |
boolean |
isPostReadings()
Determines whether reading Tokens are generated for each word that the language analyzer can predict (Chinese and Japanese). |
boolean |
isPostRoot()
Determines whether a token is generated for each root (Arabic only). |
boolean |
isPostStem()
Determines whether a token is generated for each stem. |
boolean |
isPostWord()
Determines whether a token is being generated for each word in its original form in the text |
void |
setLanguage(LanguageCode language)
Designates the language of the text to be processed. |
void |
setPostCompoundComponents(boolean b)
Specifies whether component tokens are generated for each compound word. |
void |
setPostLemma(boolean b)
Specifies whether a token is generated for each lemma (Arabic only). |
void |
setPostM1NormalizedToken(boolean b)
Specifies whether a token is generated for each many-to-one normalized word (Japanese, and other languages). |
void |
setPostNormalizedToken(boolean b)
Specifies whether a token is generated for each normalized word (Arabic only). |
void |
setPostPartOfSpeech(boolean b)
Specifies whether a part-of-speech (POS) is stored in each Token's Payload field. |
void |
setPostReadings(boolean b)
Specifies whether reading Tokens are generated for each word that the language analyzer can predict (Chinese and Japanese). |
void |
setPostRoot(boolean b)
Specifies whether a token is generated for each root (Arabic only). |
void |
setPostStem(boolean b)
Specifies whether a token is generated for each stem. |
void |
setPostWord(boolean b)
Specifies whether to generate a token for each of the original words in the text. |
void |
setRLPContextDef(String rlpContextDef)
Sets the path to the RLP XML context definition file or string must start with "<"). |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RLPTokenizerFactory()
Method Detail |
---|
public TokenStream create(Reader input)
RLPTokenizer
.
input
- The input reader.
public LanguageCode getLanguage()
public void setLanguage(LanguageCode language)
language
- The language of the text.public String getRLPContextDef()
public void setRLPContextDef(String rlpContextDef)
rlpContextDef
- Path to RLP context definition file or string.public boolean isPostWord()
public void setPostWord(boolean b)
b
- Pass true if a token is generatedfor each of the original words in the text. (Initial value: true)public boolean isPostNormalizedToken()
public void setPostNormalizedToken(boolean b)
b
- Pass true if a token is generatedfor each normalized word.public boolean isPostM1NormalizedToken()
public void setPostM1NormalizedToken(boolean b)
b
- Pass true if a token is generated for each many-to-one normalized word.public boolean isPostStem()
public void setPostStem(boolean b)
b
- Pass true if a token should be generated for each stem.public boolean isPostLemma()
public void setPostLemma(boolean b)
b
- Pass true if a token should be generated for each lemma.public boolean isPostRoot()
public void setPostRoot(boolean b)
b
- Pass true if a token is generated for each root.public boolean isPostPartOfSpeech()
public void setPostPartOfSpeech(boolean b)
b
- Pass true if a POS tag is stored in each Token's Payload field.public boolean isPostCompoundComponents()
public void setPostCompoundComponents(boolean b)
b
- Pass true if componet Tokens are generated for each compound word German, Dutch, Hungarian,
Chinese, Japanese, and Korean).public boolean isPostReadings()
public void setPostReadings(boolean b)
b
- Pass true if reading Tokens should be generated.public void inform(ResourceLoader loader)
ResourceLoaderAware#inform(ResourceLoader)
as required by the interface.
loader
- See ResourceLoaderAware#inform(ResourceLoader)
.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |