opennlp.tools.postag
Class POSTaggerME

java.lang.Object
  extended by opennlp.tools.postag.POSTaggerME
All Implemented Interfaces:
POSTagger

public class POSTaggerME
extends Object
implements POSTagger

A part-of-speech tagger that uses maximum entropy. Tries to predict whether words are nouns, verbs, or any of 70 other POS tags depending on their surrounding context.


Field Summary
protected  BeamSearch<String> beam
          The search object used for search multiple sequences of tags.
protected  POSContextGenerator contextGen
          The feature context generator.
static int DEFAULT_BEAM_SIZE
           
protected  Dictionary ngramDictionary
           
protected  opennlp.model.AbstractModel posModel
          The maximum entropy model to use to evaluate contexts.
protected  int size
          The size of the beam to be used in determining the best sequence of pos tags.
protected  TagDictionary tagDictionary
          Tag dictionary used for restricting words to a fixed set of tags.
protected  boolean useClosedClassTagsFilter
          Says whether a filter should be used to check whether a tag assignment is to a word outside of a closed class.
 
Constructor Summary
POSTaggerME(opennlp.model.AbstractModel model, Dictionary dict)
          Deprecated. 
POSTaggerME(opennlp.model.AbstractModel model, Dictionary dict, TagDictionary tagdict)
          Deprecated. 
POSTaggerME(opennlp.model.AbstractModel model, POSContextGenerator cg)
          Deprecated. 
POSTaggerME(opennlp.model.AbstractModel model, POSContextGenerator cg, TagDictionary tagdict)
          Deprecated. 
POSTaggerME(opennlp.model.AbstractModel model, TagDictionary tagdict)
          Deprecated. 
POSTaggerME(int beamSize, opennlp.model.AbstractModel model, POSContextGenerator cg, TagDictionary tagdict)
          Deprecated. 
POSTaggerME(POSModel model)
          Initializes the current instance with the provided model and the default beam size of 3.
POSTaggerME(POSModel model, int beamSize, int cacheSize)
          Initializes the current instance with the provided model and provided beam size.
 
Method Summary
 int getNumTags()
          Returns the number of different tags predicted by this model.
 String[] getOrderedTags(List<String> words, List<String> tags, int index)
           
 String[] getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs)
           
 double[] probs()
          Returns an array with the probabilities for each tag of the last tagged sentence.
 void probs(double[] probs)
          Populates the specified array with the probabilities for each tag of the last tagged sentence.
 String[][] tag(int numTaggings, String[] sentence)
          Returns at most the specified number of taggings for the specified sentence.
 List<String> tag(List<String> sentence)
          Assigns the sentence of tokens pos tags.
 String tag(String sentence)
          Assigns the sentence of space-delimied tokens pos tags.
 String[] tag(String[] sentence)
          Assigns the sentence of tokens pos tags.
 Sequence[] topKSequences(List<String> sentence)
           
 Sequence[] topKSequences(String[] sentence)
           
static POSModel train(String languageCode, ObjectStream<POSSample> samples, ModelType modelType, POSDictionary tagDictionary, Dictionary ngramDictionary, int cutoff, int iterations)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

posModel

protected opennlp.model.AbstractModel posModel
The maximum entropy model to use to evaluate contexts.


contextGen

protected POSContextGenerator contextGen
The feature context generator.


tagDictionary

protected TagDictionary tagDictionary
Tag dictionary used for restricting words to a fixed set of tags.


ngramDictionary

protected Dictionary ngramDictionary

useClosedClassTagsFilter

protected boolean useClosedClassTagsFilter
Says whether a filter should be used to check whether a tag assignment is to a word outside of a closed class.


DEFAULT_BEAM_SIZE

public static final int DEFAULT_BEAM_SIZE
See Also:
Constant Field Values

size

protected int size
The size of the beam to be used in determining the best sequence of pos tags.


beam

protected BeamSearch<String> beam
The search object used for search multiple sequences of tags.

Constructor Detail

POSTaggerME

public POSTaggerME(POSModel model)
Initializes the current instance with the provided model and the default beam size of 3.

Parameters:
model -

POSTaggerME

public POSTaggerME(POSModel model,
                   int beamSize,
                   int cacheSize)
Initializes the current instance with the provided model and provided beam size.

Parameters:
model -
beamSize -

POSTaggerME

@Deprecated
public POSTaggerME(opennlp.model.AbstractModel model,
                              TagDictionary tagdict)
Deprecated. 

Creates a new tagger with the specified model and tag dictionary.

Parameters:
model - The model used for tagging.
tagdict - The tag dictionary used for specifying a set of valid tags.

POSTaggerME

@Deprecated
public POSTaggerME(opennlp.model.AbstractModel model,
                              Dictionary dict)
Deprecated. 

Creates a new tagger with the specified model and n-gram dictionary.

Parameters:
model - The model used for tagging.
dict - The n-gram dictionary used for feature generation.

POSTaggerME

@Deprecated
public POSTaggerME(opennlp.model.AbstractModel model,
                              Dictionary dict,
                              TagDictionary tagdict)
Deprecated. 

Creates a new tagger with the specified model, n-gram dictionary, and tag dictionary.

Parameters:
model - The model used for tagging.
dict - The n-gram dictionary used for feature generation.
tagdict - The dictionary which specifies the valid set of tags for some words.

POSTaggerME

@Deprecated
public POSTaggerME(opennlp.model.AbstractModel model,
                              POSContextGenerator cg)
Deprecated. 

Creates a new tagger with the specified model and context generator.

Parameters:
model - The model used for tagging.
cg - The context generator used for feature creation.

POSTaggerME

@Deprecated
public POSTaggerME(opennlp.model.AbstractModel model,
                              POSContextGenerator cg,
                              TagDictionary tagdict)
Deprecated. 

Creates a new tagger with the specified model, context generator, and tag dictionary.

Parameters:
model - The model used for tagging.
cg - The context generator used for feature creation.
tagdict - The dictionary which specifies the valid set of tags for some words.

POSTaggerME

@Deprecated
public POSTaggerME(int beamSize,
                              opennlp.model.AbstractModel model,
                              POSContextGenerator cg,
                              TagDictionary tagdict)
Deprecated. 

Creates a new tagger with the specified beam size, model, context generator, and tag dictionary.

Parameters:
beamSize - The number of alternate tagging considered when tagging.
model - The model used for tagging.
cg - The context generator used for feature creation.
tagdict - The dictionary which specifies the valid set of tags for some words.
Method Detail

getNumTags

public int getNumTags()
Returns the number of different tags predicted by this model.

Returns:
the number of different tags predicted by this model.

tag

public List<String> tag(List<String> sentence)
Description copied from interface: POSTagger
Assigns the sentence of tokens pos tags.

Specified by:
tag in interface POSTagger
Parameters:
sentence - The sentece of tokens to be tagged.
Returns:
a list of pos tags for each token provided in sentence.

tag

public String[] tag(String[] sentence)
Description copied from interface: POSTagger
Assigns the sentence of tokens pos tags.

Specified by:
tag in interface POSTagger
Parameters:
sentence - The sentece of tokens to be tagged.
Returns:
an array of pos tags for each token provided in sentence.

tag

public String[][] tag(int numTaggings,
                      String[] sentence)
Returns at most the specified number of taggings for the specified sentence.

Parameters:
numTaggings - The number of tagging to be returned.
sentence - An array of tokens which make up a sentence.
Returns:
At most the specified number of taggings for the specified sentence.

topKSequences

public Sequence[] topKSequences(List<String> sentence)
Specified by:
topKSequences in interface POSTagger

topKSequences

public Sequence[] topKSequences(String[] sentence)
Specified by:
topKSequences in interface POSTagger

probs

public void probs(double[] probs)
Populates the specified array with the probabilities for each tag of the last tagged sentence.

Parameters:
probs - An array to put the probabilities into.

probs

public double[] probs()
Returns an array with the probabilities for each tag of the last tagged sentence.

Returns:
an array with the probabilities for each tag of the last tagged sentence.

tag

public String tag(String sentence)
Description copied from interface: POSTagger
Assigns the sentence of space-delimied tokens pos tags.

Specified by:
tag in interface POSTagger
Parameters:
sentence - The sentece of space-delimited tokens to be tagged.
Returns:
a string of space-delimited pos tags for each token provided in sentence.

getOrderedTags

public String[] getOrderedTags(List<String> words,
                               List<String> tags,
                               int index)

getOrderedTags

public String[] getOrderedTags(List<String> words,
                               List<String> tags,
                               int index,
                               double[] tprobs)

train

public static POSModel train(String languageCode,
                             ObjectStream<POSSample> samples,
                             ModelType modelType,
                             POSDictionary tagDictionary,
                             Dictionary ngramDictionary,
                             int cutoff,
                             int iterations)
                      throws IOException
Throws:
IOException


Copyright © 2010. All Rights Reserved.