com.basistech.rlp
Class RLPResultRandomAccess

java.lang.Object
  extended by com.basistech.rlp.RLPResultRandomAccess

public class RLPResultRandomAccess
extends Object

Low-level data access to RLP results. This class retrieves results from a RLP context in 'minimally-processed' form.


Constructor Summary
RLPResultRandomAccess(RLPContext a_context)
          Creates a result accessor for the given context.
 
Method Summary
 boolean getConsistentType()
          Get consistent named entity type setting
 LanguageCode getDetectedLanguage()
          Returns an object describing the language of the data in the context.
 NamedEntityData[] getNamedEntities(boolean stripAffixes)
          Returns all of the named entities from the context, optionally stripping affixes.
 Object getResultData(int resultType)
          Retrieves data for a RLP result item with minimal processing for access from java.
 void setConsistentType(boolean flag)
          Set consistent named entity type
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RLPResultRandomAccess

public RLPResultRandomAccess(RLPContext a_context)
Creates a result accessor for the given context.

Parameters:
a_context - Source context
Method Detail

setConsistentType

public void setConsistentType(boolean flag)
Set consistent named entity type

Parameters:
flag - if true, assign named entity type of the 1st occurring named entity to all subsequent matching normalized named entities.

getConsistentType

public boolean getConsistentType()
Get consistent named entity type setting

Returns:
The consistent named entity type setting.

getNamedEntities

public NamedEntityData[] getNamedEntities(boolean stripAffixes)
Returns all of the named entities from the context, optionally stripping affixes. This function conveniently packages up the data from several of the result types returned by getResultData.

Parameters:
stripAffixes - Whether to remove conjunctions and similiar attached particles to deliver 'clean' names.
Returns:
An array of objects containing the information for the entities.

getDetectedLanguage

public LanguageCode getDetectedLanguage()
Returns an object describing the language of the data in the context. For compatibility reasons, getResultData returns an integer code for the detected language.

Returns:
The language.

getResultData

public Object getResultData(int resultType)
Retrieves data for a RLP result item with minimal processing for access from java. Several result types consists of vectors of tuples of integers. For example, the TOKEN_OFFSET result type consists of a vector of pairs of integers. In these cases, the return value is a single IntBuffer, and the caller must unpack it according to the arity of the tuples. For example, for NAMED_ENTITY, the resulting IntBuffer consists of triples of integers.

Parameters:
resultType - int constant identifying the data to retrieve.
Returns:
return value as appropriate for the result type.
Result Type Return Type Comment
RLPConstants.TOKEN String[] One string per token.
RLPConstants.PART_OF_SPEECH String[] One string per token.
RLPConstants.SENTENCE_BOUNDARY IntBuffer token index of end of each sentence
RLPConstants.BASE_NOUN_PHRASE IntBuffer start/end token index pairs. See: RLPIntegerPair.List
RLPConstants.DETECTED_ENCODING String MIME charset
RLPConstants.DETECTED_LANGUAGE Integer LanguageCode
RLPConstants.DETECTED_SCRIPT Integer ISO15924
RLPConstants.NAMED_ENTITY IntBuffer three ints per entity: start/end/type. See: RLPIntegerTriple.List
RLPConstants.TOKEN_OFFSET IntBuffer two ints: start/end char offset per token. See: RLPIntegerPair.List
RLPConstants.STEM String[] One string per token.
RLPConstants.NORMALIZED_TOKEN String[] One string per token.
RLPConstants.COMPOUND SortedMap<Integer, String[]> Maps each compound word at the index to an array of its components.
RLPConstants.READING SortedMap<Integer, String[]> Maps each word at the index to an array of its possible readings.
RLPConstants.RAW_TEXT CharBuffer The full text.
RLPConstants.STOPWORD IntBuffer The list of stopword tokens.
RLPConstants.GAZETTEER_NAMES String[] The list of gazetter names.
RLPConstants.ROOTS String[] The roots (for semitic languages).
RLPConstants.FLAGS IntBuffer[] Status flags, as defined for particular processors.


Copyright © 2004-2008 Basis Technology Corporation. All Rights Reserved.