|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.basistech.rlp.RLPResultAccess
public class RLPResultAccess
Provides access to RLP results. Includes four methods for returning result data. The following table indicates which method and type parameter (an int constant) to use for each RLP result type and describes the method's return value.
Method(type) | Return value | |
List<Object> getListResult(RLPConstants.TOKEN) |
Each list element is a String: token. | |
List<Object> getListResult(RLPConstants.PART_OF_SPEECH) |
Each list element is a String: part-of-speech tag. | |
List<Object> getListResult(RLPConstants.SENTENCE_BOUNDARY) |
Each list element is an Integer: index of first token in the sentence. | |
List<Object> getListResult(RLPConstants.BASE_NOUN_PHRASE) |
Each list element is int[2]: start, end + 1 token indexes. | |
String getStringResult(RLPConstants.DETECTED_ENCODING) |
Returns String: MIME charset. | |
Integer getIntegerResult(RLPConstants.DETECTED_LANGUAGE) |
Returns Integer: BT language code. See LanguageCode . |
|
Integer getIntegerResult(RLPConstants.DETECTED_SCRIPT) |
Returns Integer: ISO15924 script code. See ISO15924 . |
|
List<Object> getListResult(RLPConstants.NAMED_ENTITY) |
Each list element is int[3]: start and end + 1 token indexes, type. | |
List<Object> getListResult(RLPConstants.TOKEN_OFFSET) |
Each list element is int[2]: start and end + 1 char offsets. | |
List<Object> getListResult(RLPConstants.STEM) |
Each list element is a String: dictionary form of token. | |
List<Object> getListResult(RLPConstants.NORMALIZED_TOKEN) |
Each list element is a String: normalized form of token. | |
List<Object> getListResult(RLPConstants.MANY_TO_ONE_NORMALIZED_TOKEN) |
Each list element is a String: many-to-one normalized form of token. | |
Map<Integer,String[]> getMapResult(RLPConstants.COMPOUND) |
Each Map.Entry key is an Integer: associated token index.
Each value is String[]: constituents of the compound word. |
|
Map<Integer,String[]> getMapResult(RLPConstants.READING) |
Each Map.Entry key is an Integer: associated token index.
Each value is String[]: alternate readings (transcriptions). |
|
String getStringResult(RLPConstants.RAW_TEXT) |
Returns String: the full text. | |
String getStringResult(RLPConstants.TRANSCRIBED_TEXT) |
Returns String: transcription or transliteration of input text. | |
List<Object> getListResult(RLPConstants.STOPWORD) |
Each list element is an Integer: token index. | |
List<Object> getListResult(RLPConstants.GAZETTEER_NAMES) |
Each list element is a String: gazetter name. | |
List<Object> getListResult(RLPConstants.ROOTS) |
Each list element is a String: the root (for semitic languages). | |
List<Object> getListResult(RLPConstants.FLAGS) |
Each list element is an Integer: a status flag, as defined for a particular processor. | |
Map<Integer,String[]> getMapResult(RLPConstants.TOKEN_VARIATIONS) |
Each Map.Entry key is an Integer: associated token index.
Each value is String[]: variant romanizations for Arabic script token. |
|
List<Object> getListResult(RLPConstants.TEXT_BOUNDARIES) |
Each list element is an Integer: char offset + 1 of sentence-level text boundary. | |
List<Object> getListResult(RLPConstants.SCRIPT_REGION) |
Each list element is int[3]: start and end + 1 char offsets, ISO15924 script identifier. | |
List<Object> getListResult(RLPConstants.LANGUAGE_REGION) |
Each list element is int[6]: (1) start and (2) end + 1 char offsets, (3) level,
(4) type, (5) script (not currently used), (6)BT language code (see LanguageCode ). |
|
List<Object> getListResult(RLPConstants.TOKEN_SOURCE_ID) |
Each list element is Integer: index of source name of token. | |
List<Object> getListResult(RLPConstants.TOKEN_SOURCE_NAME) |
Each list element is Stringg: a source name. | |
List<Object> getListResult(RLPConstants.LEMMA) |
Each list element is String: a lemma. | |
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_LEMMAS) |
Each Map.Entry key is an Integer: associated token index.
Each value is String[]: alternative lemma for Arabic token. |
|
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_NORM) |
Each Map.Entry key is an Integer: associated token index.
Each value is String[]: alternative normalized token for Arabic token. |
|
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_PARTS_OF_SPEECH) |
Each Map.Entry key is an Integer: associated token index.
Each value is String[]: alternative part of speech for Arabic token. |
|
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_ROOTS) |
Each Map.Entry key is an Integer: associated token index.
Each value is String[]: alternative root for Arabic token. |
|
Map<Integer,String[]> getMapResult(RLPConstants.ALTERNATIVE_STEMS) |
Each Map.Entry key is an Integer: associated token index.
Each value is String[]: alternative stem for Arabic token. |
Constructor Summary | |
---|---|
RLPResultAccess(RLPContext context)
Instantiates a RLPResultAccess object that you can use to retrieve RLP results. |
Method Summary | |
---|---|
boolean |
getConsistentType()
Get consistent named entity type setting |
LanguageCode |
getDetectedLanguage()
Returns the language code for the input document. |
ISO15924 |
getDetectedScript()
Returns the ISO15924 script code for the input document. |
Integer |
getIntegerResult(int type)
Returns Integer RLP result; for the result type that contains a single int. |
List<Object> |
getListResult(int type)
Returns a list of RLP results; for result types that include multiple Strings, ints (returned as Integers), or int arrays. |
Map<Integer,String[]> |
getMapResult(int type)
Returns sorted Map of RLP results; for result types that include a token index (returned as an Integer) and an array of strings. |
NamedEntityData[] |
getNamedEntityData(boolean stripAffixes)
For each named entity, returns the raw named entity (untouched), the normalized named entity (which matches the raw named entity if normalized tokens are not available for the language), entity type, string representation of entity type, token indexes, raw text offsets, and prefix lengths (for Arabic text). |
String |
getStringResult(int type)
Returns a String; for result types that contain a single String. |
void |
setConsistentType(boolean flag)
Set consistent named entity type |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RLPResultAccess(RLPContext context)
context
- The RLP context used to process text input.Method Detail |
---|
public List<Object> getListResult(int type) throws RLPException
List.get(int)
or Iterator.next()
)
to String, Integer, or int[] as indicated below.
Result type | Value of each list element | |
RLPConstants.TOKEN |
String: token. | |
RLPConstants.PART_OF_SPEECH |
String: part-of-speech tag. | |
RLPConstants.SENTENCE_BOUNDARY |
Integer: index of first token in the sentence. | |
RLPConstants.BASE_NOUN_PHRASE |
int[2]: start, end + 1 token indexes. | |
RLPConstants.NAMED_ENTITY |
int[3]: start and end + 1 token indexes, type. | |
RLPConstants.TOKEN_OFFSET |
int[2]: start and end + 1 char offsets. | |
RLPConstants.STEM |
String: dictionary form of token. | |
RLPConstants.NORMALIZED_TOKEN |
String: normalized form of token. | |
RLPConstants.MANY_TO_ONE_NORMALIZED_TOKEN |
String: many-to-one normalized form of token. | |
RLPConstants.STOPWORD |
Integer: token index. | |
RLPConstants.GAZETTEER_NAMES |
String: gazetter name. | |
RLPConstants.ROOTS |
String: the root (for semitic languages). | |
RLPConstants.LEMMA |
String: the lemma (for Arabic). | |
RLPConstants.FLAGS |
Integer: a status flag, as defined for a particular processor. | |
RLPConstants.MAP_OFFSETS |
Integer: char offset + 1 of transformed text. | |
RLPConstants.TEXT_BOUNDARIES |
Integer: char offset + 1 of sentence-level text boundary. | |
RLPConstants.SCRIPT_REGION |
int[3]: start and end + 1 char offsets, ISO15924 script identifier. | |
RLPConstants.LANGUAGE_REGION |
int[6]: (1) start and (2) end + 1 char offsets, (3) level, (4) type,
(5) script (not currently used), (6)BT language code (see LanguageCode ). |
|
RLPConstants.TOKEN_SOURCE_ID |
Integer: index of source name of token. | |
RLPConstants.TOKEN_SOURCE_NAME |
String: list of source names. |
type
- RLP result type.
RLPException
- When called with a result type for which this method does not apply.public Map<Integer,String[]> getMapResult(int type) throws RLPException
Result type | Value of each Map.Entry . |
RLPConstants.COMPOUND |
constituents of the compound word. |
RLPConstants.READING |
alternate readings (transcriptions). |
RLPConstants.TOKEN_VARIATIONS |
variant romanizations for Arabic script tokens. |
RLPConstants.ALTERNATIVE_LEMMAS |
String[]: alternative lemmas for Arabic token. |
RLPConstants.ALTERNATIVE_NORM |
String[]: alternative normalized tokens for Arabic token. |
RLPConstants.ALTERNATIVE_PARTS_OF_SPEECH |
String[]: alternative parts of speech for Arabic token. |
RLPConstants.ALTERNATIVE_ROOTS |
String[]: alternative roots for Arabic token. |
RLPConstants.ALTERNATIVE_STEMS |
String[]: alternative stems for Arabic token. |
type
- RLP result type.
RLPException
- When called with a result type for which this method does not apply.public String getStringResult(int type) throws RLPException
Result type | Value |
RLPConstants.DETECTED_ENCODING |
the MIME charset |
RLPConstants.RAW_TEXT |
complete input text |
RLPConstants.TRANSCRIBED_TEXT |
transliteration or transcription of complete input text |
type
- RLP result type.
RLPException
- When called with a result type for which this method does not apply.public Integer getIntegerResult(int type) throws RLPException
RLPConstants.DETECTED_LANGUAGE
,
or with the RLPConstants.DETECTED_SCRIPT
result type, in order to get the language code, or the script code, respectively.
See LanguageCode
, or
ISO15924
.
type
- RLP result type.
RLPException
- When called with a result type for which this method does not apply.public LanguageCode getDetectedLanguage() throws RLPException
RLPException
public ISO15924 getDetectedScript() throws RLPException
RLPException
public void setConsistentType(boolean flag)
flag
- if true, assign named entity type of the 1st occurring named entity to all
subsequent matching normalized named entities.public boolean getConsistentType()
public NamedEntityData[] getNamedEntityData(boolean stripAffixes) throws RLPException
stripAffixes
- if true, strip affixes (always a prefix) if any from the first token in each named
entity (Arabic text only)
RLPException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |