opennlp.tools.util.featuregen
Class TokenPatternFeatureGenerator

java.lang.Object
  extended by opennlp.tools.util.featuregen.FeatureGeneratorAdapter
      extended by opennlp.tools.util.featuregen.TokenPatternFeatureGenerator
All Implemented Interfaces:
AdaptiveFeatureGenerator

public class TokenPatternFeatureGenerator
extends FeatureGeneratorAdapter

Partitions tokens into sub-tokens based on character classes and generates class features for each of the sub-tokens and combinations of those sub-tokens.


Constructor Summary
TokenPatternFeatureGenerator()
          Initializes a new instance.
TokenPatternFeatureGenerator(Tokenizer supportTokenizer)
          Initializes a new instance.
 
Method Summary
 void createFeatures(List<String> feats, String[] toks, int index, String[] preds)
          Adds the appropriate features for the token at the specified index with the specified array of previous outcomes to the specified list of features.
 
Methods inherited from class opennlp.tools.util.featuregen.FeatureGeneratorAdapter
clearAdaptiveData, updateAdaptiveData
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenPatternFeatureGenerator

public TokenPatternFeatureGenerator()
Initializes a new instance. For tokinization the SimpleTokenizer is used.


TokenPatternFeatureGenerator

public TokenPatternFeatureGenerator(Tokenizer supportTokenizer)
Initializes a new instance.

Parameters:
supportTokenizer -
Method Detail

createFeatures

public void createFeatures(List<String> feats,
                           String[] toks,
                           int index,
                           String[] preds)
Description copied from interface: AdaptiveFeatureGenerator
Adds the appropriate features for the token at the specified index with the specified array of previous outcomes to the specified list of features.

Parameters:
feats - The list of features to be added to.
toks - The tokens of the sentence or other text unit being processed.
index - The index of the token which is currently being processed.
preds - The outcomes for the tokens prior to the specified index.


Copyright © 2010. All Rights Reserved.