opennlp.tools.util.featuregen
Class CharacterNgramFeatureGenerator

java.lang.Object
  extended by opennlp.tools.util.featuregen.FeatureGeneratorAdapter
      extended by opennlp.tools.util.featuregen.CharacterNgramFeatureGenerator
All Implemented Interfaces:
AdaptiveFeatureGenerator

public class CharacterNgramFeatureGenerator
extends FeatureGeneratorAdapter

The CharacterNgramFeatureGenerator uses character ngrams to generate features about each token. The minimum and maximum length can be specified.


Constructor Summary
CharacterNgramFeatureGenerator()
          Initializes the current instance with min 2 length and max 5 length of ngrams.
CharacterNgramFeatureGenerator(int minLength, int maxLength)
           
 
Method Summary
 void createFeatures(List<String> features, String[] tokens, int index, String[] preds)
          Adds the appropriate features for the token at the specified index with the specified array of previous outcomes to the specified list of features.
 
Methods inherited from class opennlp.tools.util.featuregen.FeatureGeneratorAdapter
clearAdaptiveData, updateAdaptiveData
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CharacterNgramFeatureGenerator

public CharacterNgramFeatureGenerator(int minLength,
                                      int maxLength)

CharacterNgramFeatureGenerator

public CharacterNgramFeatureGenerator()
Initializes the current instance with min 2 length and max 5 length of ngrams.

Method Detail

createFeatures

public void createFeatures(List<String> features,
                           String[] tokens,
                           int index,
                           String[] preds)
Description copied from interface: AdaptiveFeatureGenerator
Adds the appropriate features for the token at the specified index with the specified array of previous outcomes to the specified list of features.

Parameters:
features - The list of features to be added to.
tokens - The tokens of the sentence or other text unit being processed.
index - The index of the token which is currently being processed.
preds - The outcomes for the tokens prior to the specified index.


Copyright © 2010. All Rights Reserved.