com.basistech.rlp.lucene
Class RLPEnAnalyzer

java.lang.Object
  extended by Analyzer
      extended by com.basistech.rlp.lucene.RLPAnalyzer
          extended by com.basistech.rlp.lucene.RLPEnAnalyzer

public class RLPEnAnalyzer
extends RLPAnalyzer

An Analyzer for English that uses RLP. To use this analyzer, you must have a valid RLP license that enables the Base Linguistics language processor for European languages (BL1 LP).
This Analyzer uses RLPTokenizer, LowerCaseFilter, and RLPPOSFilter (only if POS generation is turned on and the allowed POS tag list is provided).
Note: Although this is currently implemented as a subclass of RLPAnalyzer, this is regarded as an implementation detail, and it may change in the future. The eventual contract is that it is a subclass of Lucene Analyzer.


Constructor Summary
RLPEnAnalyzer()
          This default constructor uses a default RLP Context, which only includes BL1 LP, along with the default set of post types, and the default POS tags for English processing.
RLPEnAnalyzer(String rlpContextDef)
          This constructor uses default set of the post types, which are STEM (which is actually a lemma) POS (part-of-speech in Token's payload field)
RLPEnAnalyzer(String rlpContextDef, EnumSet<RLPTokenizer.PostType> postTypes)
          This constructor uses the part-of-speech filter with the default part-of-speech tag set.
RLPEnAnalyzer(String rlpContextDef, EnumSet<RLPTokenizer.PostType> postTypes, String[] allowedPOSTags)
          This constructor does not use default values.
 
Method Summary
static String[] getDefaultAllowedPOSTags()
          Gets the array of part-of-speech (POS) tags that is assumed when constructor without such argument is used.
static String getDefaultContextDefinition()
          Gets the default context definition, which only contains the BL1 LP.
static EnumSet<RLPTokenizer.PostType> getDefaultPostTypes()
          Gets the set of post types that is assumed when a constructor without such argument is used.
static void main(String[] args)
          (Internal use only) Tokenizes an English sentence and displays the results.
 
Methods inherited from class com.basistech.rlp.lucene.RLPAnalyzer
getDetectedLanguage, tokenStream
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RLPEnAnalyzer

public RLPEnAnalyzer(String rlpContextDef,
                     EnumSet<RLPTokenizer.PostType> postTypes,
                     String[] allowedPOSTags)
This constructor does not use default values.

Parameters:
rlpContextDef - Context definition that RLP uses to process text: an XML string or path to XML file.
postTypes - RLP Result types for which the tokenizer will generate tokens.
allowedPOSTags - POSTagFilter will accept tokens with these POS tags.
See Also:
RLPAnalyzer.RLPAnalyzer(LanguageCode, String, EnumSet, String[])

RLPEnAnalyzer

public RLPEnAnalyzer(String rlpContextDef,
                     EnumSet<RLPTokenizer.PostType> postTypes)
This constructor uses the part-of-speech filter with the default part-of-speech tag set.

Parameters:
rlpContextDef - Context definition that RLP uses to process text: an XML string or path to XML file.
postTypes - RLP Result types for which the tokenizer will generate tokens.
See Also:
RLPAnalyzer.RLPAnalyzer(LanguageCode, String, EnumSet)

RLPEnAnalyzer

public RLPEnAnalyzer(String rlpContextDef)
This constructor uses default set of the post types, which are

Parameters:
rlpContextDef - Context definition that RLP uses to process text: an XML string or path to XML file.
See Also:
RLPAnalyzer.RLPAnalyzer(LanguageCode, String)

RLPEnAnalyzer

public RLPEnAnalyzer()
This default constructor uses a default RLP Context, which only includes BL1 LP, along with the default set of post types, and the default POS tags for English processing.

See Also:
RLPAnalyzer.RLPAnalyzer(LanguageCode)
Method Detail

getDefaultContextDefinition

public static String getDefaultContextDefinition()
Gets the default context definition, which only contains the BL1 LP.

Returns:
The context definition string in XML that the default constructor uses.

getDefaultAllowedPOSTags

public static String[] getDefaultAllowedPOSTags()
Gets the array of part-of-speech (POS) tags that is assumed when constructor without such argument is used. The list is made of the POS tags of word tokens. Fine-tuning this list is recommended for optimal search results.

Returns:
The default array of allowed part-of-speech tags that the default constructor uses.

getDefaultPostTypes

public static EnumSet<RLPTokenizer.PostType> getDefaultPostTypes()
Gets the set of post types that is assumed when a constructor without such argument is used. For English, this contains only STEM and POS. WORD is turned off as it would just double the number of Tokens.

Returns:
The default set of post types.

main

public static void main(String[] args)
(Internal use only) Tokenizes an English sentence and displays the results.

Parameters:
args - The English sentence to process (arg[0]). If you do not include an arg, a default sentence is processed.


Copyright © 2007-2008 Basis Technology Corporation. All Rights Reserved.