com.basistech.rlp
Class RLPResultAccess

java.lang.Object
  extended by com.basistech.rlp.RLPResultAccess

public class RLPResultAccess
extends Object

Provides access to RLP results. Includes four methods for returning result data. The following table indicates which method and type parameter (an int constant) to use for each RLP result type and describes the method's return value.

Method(type) Return value
List<Object> getListResult(RLPConstants.TOKEN) Each list element is a String: token.
List<Object> getListResult(RLPConstants.PART_OF_SPEECH) Each list element is a String: part-of-speech tag.
List<Object> getListResult(RLPConstants.SENTENCE_BOUNDARY) Each list element is an Integer: index of first token in the sentence.
List<Object> getListResult(RLPConstants.BASE_NOUN_PHRASE) Each list element is int[2]: start, end + 1 token indexes.
String getStringResult(RLPConstants.DETECTED_ENCODING) Returns String: MIME charset.
Integer getIntegerResult(RLPConstants.DETECTED_LANGUAGE) Returns Integer: BT language code. See LanguageCode.
Integer getIntegerResult(RLPConstants.DETECTED_SCRIPT) Returns Integer: ISO15924 script code. See ISO15924.
List<Object> getListResult(RLPConstants.NAMED_ENTITY) Each list element is int[3]: start and end + 1 token indexes, type.
List<Object> getListResult(RLPConstants.TOKEN_OFFSET) Each list element is int[2]: start and end + 1 char offsets.
List<Object> getListResult(RLPConstants.STEM) Each list element is a String: dictionary form of token.
List<Object> getListResult(RLPConstants.NORMALIZED_TOKEN) Each list element is a String: normalized form of token.
List<Object> getListResult(RLPConstants.MANY_TO_ONE_NORMALIZED_TOKEN) Each list element is a String: many-to-one normalized form of token.
Map<Integer,String[]> getMapResult(RLPConstants.COMPOUND) Each Map.Entry key is an Integer: associated token index. Each value is String[]: constituents of the compound word.
Map<Integer,String[]> getMapResult(RLPConstants.READING) Each Map.Entry key is an Integer: associated token index. Each value is String[]: alternate readings (transcriptions).
String getStringResult(RLPConstants.RAW_TEXT) Returns String: the full text.
String getStringResult(RLPConstants.TRANSCRIBED_TEXT) Returns String: transcription or transliteration of input text.
List<Object> getListResult(RLPConstants.STOPWORD) Each list element is an Integer: token index.
List<Object> getListResult(RLPConstants.GAZETTEER_NAMES) Each list element is a String: gazetter name.
List<Object> getListResult(RLPConstants.ROOTS) Each list element is a String: the root (for semitic languages).
List<Object> getListResult(RLPConstants.FLAGS) Each list element is an Integer: a status flag, as defined for a particular processor.
Map<Integer,String[]> getMapResult(RLPConstants.TOKEN_VARIATIONS) Each Map.Entry key is an Integer: associated token index. Each value is String[]: variant romanizations for Arabic script token.
List<Object> getListResult(RLPConstants.TEXT_BOUNDARIES) Each list element is an Integer: char offset + 1 of sentence-level text boundary.
List<Object> getListResult(RLPConstants.SCRIPT_REGION) Each list element is int[3]: start and end + 1 char offsets, ISO15924 script identifier.
List<Object> getListResult(RLPConstants.LANGUAGE_REGION) Each list element is int[6]: (1) start and (2) end + 1 char offsets, (3) level, (4) type, (5) script (not currently used), (6)BT language code (see LanguageCode).
List<Object> getListResult(RLPConstants.TOKEN_SOURCE_ID) Each list element is Integer: index of source name of token.
List<Object> getListResult(RLPConstants.TOKEN_SOURCE_NAME) Each list element is Stringg: a source name.
List<Object> getListResult(RLPConstants.LEMMA) Each list element is String: a lemma.
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_LEMMAS) Each Map.Entry key is an Integer: associated token index. Each value is String[]: alternative lemma for Arabic token.
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_NORM) Each Map.Entry key is an Integer: associated token index. Each value is String[]: alternative normalized token for Arabic token.
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_PARTS_OF_SPEECH) Each Map.Entry key is an Integer: associated token index. Each value is String[]: alternative part of speech for Arabic token.
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_ROOTS) Each Map.Entry key is an Integer: associated token index. Each value is String[]: alternative root for Arabic token.
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_STEMS) Each Map.Entry key is an Integer: associated token index. Each value is String[]: alternative stem for Arabic token.


Constructor Summary
RLPResultAccess(RLPContext context)
          Instantiates a RLPResultAccess object that you can use to retrieve RLP results.
 
Method Summary
 boolean getConsistentType()
          Get consistent named entity type setting
 LanguageCode getDetectedLanguage()
          Returns the language code for the input document.
 ISO15924 getDetectedScript()
          Returns the ISO15924 script code for the input document.
 Integer getIntegerResult(int type)
          Returns Integer RLP result; for the result type that contains a single int.
 List<Object> getListResult(int type)
          Returns a list of RLP results; for result types that include multiple Strings, ints (returned as Integers), or int arrays.
 Map<Integer,String[]> getMapResult(int type)
          Returns sorted Map of RLP results; for result types that include a token index (returned as an Integer) and an array of strings.
 NamedEntityData[] getNamedEntityData(boolean stripAffixes)
          For each named entity, returns the raw named entity (untouched), the normalized named entity (which matches the raw named entity if normalized tokens are not available for the language), entity type, string representation of entity type, token indexes, raw text offsets, and prefix lengths (for Arabic text).
 String getStringResult(int type)
          Returns a String; for result types that contain a single String.
 void setConsistentType(boolean flag)
          Set consistent named entity type
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RLPResultAccess

public RLPResultAccess(RLPContext context)
Instantiates a RLPResultAccess object that you can use to retrieve RLP results. You can use a single RLPResultAccess object to retrieve any and all results produced by processing a given input text stream with the specified RLPContext object.

Parameters:
context - The RLP context used to process text input.
Method Detail

getListResult

public List<Object> getListResult(int type)
                           throws RLPException
Returns a list of RLP results; for result types that include multiple Strings, ints (returned as Integers), or int arrays. Depending on result type, you can cast the list elements (the Objects returned by List.get(int) or Iterator.next()) to String, Integer, or int[] as indicated below.
Result type Value of each list element
RLPConstants.TOKEN String: token.
RLPConstants.PART_OF_SPEECH String: part-of-speech tag.
RLPConstants.SENTENCE_BOUNDARY Integer: index of first token in the sentence.
RLPConstants.BASE_NOUN_PHRASE int[2]: start, end + 1 token indexes.
RLPConstants.NAMED_ENTITY int[3]: start and end + 1 token indexes, type.
RLPConstants.TOKEN_OFFSET int[2]: start and end + 1 char offsets.
RLPConstants.STEM String: dictionary form of token.
RLPConstants.NORMALIZED_TOKEN String: normalized form of token.
RLPConstants.MANY_TO_ONE_NORMALIZED_TOKEN String: many-to-one normalized form of token.
RLPConstants.STOPWORD Integer: token index.
RLPConstants.GAZETTEER_NAMES String: gazetter name.
RLPConstants.ROOTS String: the root (for semitic languages).
RLPConstants.LEMMA String: the lemma (for Arabic).
RLPConstants.FLAGS Integer: a status flag, as defined for a particular processor.
RLPConstants.MAP_OFFSETS Integer: char offset + 1 of transformed text.
RLPConstants.TEXT_BOUNDARIES Integer: char offset + 1 of sentence-level text boundary.
RLPConstants.SCRIPT_REGION int[3]: start and end + 1 char offsets, ISO15924 script identifier.
RLPConstants.LANGUAGE_REGION int[6]: (1) start and (2) end + 1 char offsets, (3) level, (4) type, (5) script (not currently used), (6)BT language code (see LanguageCode).
RLPConstants.TOKEN_SOURCE_ID Integer: index of source name of token.
RLPConstants.TOKEN_SOURCE_NAME String: list of source names.

Parameters:
type - RLP result type.
Returns:
List of String, Integer, or int[].
Throws:
RLPException - When called with a result type for which this method does not apply.

getMapResult

public Map<Integer,String[]> getMapResult(int type)
                                   throws RLPException
Returns sorted Map of RLP results; for result types that include a token index (returned as an Integer) and an array of strings. For each map entry, the key is an Integer index to the related token. and the value is an array of Strings.
Result type Value of each Map.Entry.
RLPConstants.COMPOUND constituents of the compound word.
RLPConstants.READING alternate readings (transcriptions).
RLPConstants.TOKEN_VARIATIONS variant romanizations for Arabic script tokens.
RLPConstants.ALTERNATIVE_LEMMAS String[]: alternative lemmas for Arabic token.
RLPConstants.ALTERNATIVE_NORM String[]: alternative normalized tokens for Arabic token.
RLPConstants.ALTERNATIVE_PARTS_OF_SPEECH String[]: alternative parts of speech for Arabic token.
RLPConstants.ALTERNATIVE_ROOTS String[]: alternative roots for Arabic token.
RLPConstants.ALTERNATIVE_STEMS String[]: alternative stems for Arabic token.

Parameters:
type - RLP result type.
Returns:
Map with Integer keys and String[] values.
Throws:
RLPException - When called with a result type for which this method does not apply.

getStringResult

public String getStringResult(int type)
                       throws RLPException
Returns a String; for result types that contain a single String. Use this method to get the following result types:
Result type Value
RLPConstants.DETECTED_ENCODING the MIME charset
RLPConstants.RAW_TEXT complete input text
RLPConstants.TRANSCRIBED_TEXT transliteration or transcription of complete input text

Parameters:
type - RLP result type.
Returns:
MIME charset, complete text, or transcription/transliteration of complete text.
Throws:
RLPException - When called with a result type for which this method does not apply.

getIntegerResult

public Integer getIntegerResult(int type)
                         throws RLPException
Returns Integer RLP result; for the result type that contains a single int. Use this method with the RLPConstants.DETECTED_LANGUAGE, or with the RLPConstants.DETECTED_SCRIPT result type, in order to get the language code, or the script code, respectively. See LanguageCode, or ISO15924.

Parameters:
type - RLP result type.
Returns:
language code or the script code in an Integer.
Throws:
RLPException - When called with a result type for which this method does not apply.

getDetectedLanguage

public LanguageCode getDetectedLanguage()
                                 throws RLPException
Returns the language code for the input document.

Returns:
language code
Throws:
RLPException

getDetectedScript

public ISO15924 getDetectedScript()
                           throws RLPException
Returns the ISO15924 script code for the input document. If the user or RLI has not designated a script, may return null.

Returns:
script code
Throws:
RLPException

setConsistentType

public void setConsistentType(boolean flag)
Set consistent named entity type

Parameters:
flag - if true, assign named entity type of the 1st occurring named entity to all subsequent matching normalized named entities.

getConsistentType

public boolean getConsistentType()
Get consistent named entity type setting

Returns:
The consistent named entity type setting.

getNamedEntityData

public NamedEntityData[] getNamedEntityData(boolean stripAffixes)
                                     throws RLPException
For each named entity, returns the raw named entity (untouched), the normalized named entity (which matches the raw named entity if normalized tokens are not available for the language), entity type, string representation of entity type, token indexes, raw text offsets, and prefix lengths (for Arabic text).

Parameters:
stripAffixes - if true, strip affixes (always a prefix) if any from the first token in each named entity (Arabic text only)
Returns:
array of NamedEntityData objects
Throws:
RLPException


Copyright © 2004-2008 Basis Technology Corporation. All Rights Reserved.