opennlp.tools.sentdetect
Class DefaultSDContextGenerator

java.lang.Object
  extended by opennlp.tools.sentdetect.DefaultSDContextGenerator
All Implemented Interfaces:
SDContextGenerator
Direct Known Subclasses:
SentenceContextGenerator

public class DefaultSDContextGenerator
extends Object
implements SDContextGenerator

Generate event contexts for maxent decisions for sentence detection.


Field Summary
protected  StringBuffer buf
          String buffer for generating features.
protected  List<String> collectFeats
          List for holding features as they are generated.
 
Constructor Summary
DefaultSDContextGenerator(char[] eosCharacters)
          Creates a new SDContextGenerator instance with no induced abbreviations.
DefaultSDContextGenerator(Set<String> inducedAbbreviations, char[] eosCharacters)
          Creates a new SDContextGenerator instance which uses the set of induced abbreviations.
 
Method Summary
protected  void collectFeatures(String prefix, String suffix, String previous, String next)
          Determines some of the features for the sentence detector and adds them to list features.
 String[] getContext(CharSequence sb, int position)
          Returns an array of contextual features for the potential sentence boundary at the specified position within the specified string buffer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

buf

protected StringBuffer buf
String buffer for generating features.


collectFeats

protected List<String> collectFeats
List for holding features as they are generated.

Constructor Detail

DefaultSDContextGenerator

public DefaultSDContextGenerator(char[] eosCharacters)
Creates a new SDContextGenerator instance with no induced abbreviations.

Parameters:
eosCharacters -

DefaultSDContextGenerator

public DefaultSDContextGenerator(Set<String> inducedAbbreviations,
                                 char[] eosCharacters)
Creates a new SDContextGenerator instance which uses the set of induced abbreviations.

Parameters:
inducedAbbreviations - a Set of Strings representing induced abbreviations in the training data. Example: "Mr."
eosCharacters -
Method Detail

getContext

public String[] getContext(CharSequence sb,
                           int position)
Description copied from interface: SDContextGenerator
Returns an array of contextual features for the potential sentence boundary at the specified position within the specified string buffer.

Specified by:
getContext in interface SDContextGenerator
Parameters:
sb - The String for which sentences are being determined.
position - An index into the specified string buffer when a sentence boundary may occur.
Returns:
an array of contextual features for the potential sentence boundary at the specified position within the specified string buffer.

collectFeatures

protected void collectFeatures(String prefix,
                               String suffix,
                               String previous,
                               String next)
Determines some of the features for the sentence detector and adds them to list features.

Parameters:
prefix - String preceeding the eos character in the eos token.
suffix - String following the eos character in the eos token.
previous - Space delimited token preceeding token containing eos character.
next - Space delimited token following token containsing eos character.


Copyright © 2010. All Rights Reserved.