opennlp.tools.chunker
Class ChunkerME

java.lang.Object
  extended by opennlp.tools.chunker.ChunkerME
All Implemented Interfaces:
Chunker

public class ChunkerME
extends Object
implements Chunker

The class represents a maximum-entropy-based chunker. Such a chunker can be used to find flat structures based on sequence inputs such as noun phrases or named entities.


Field Summary
protected  BeamSearch<String> beam
          The beam used to search for sequences of chunk tag assignments.
static int DEFAULT_BEAM_SIZE
           
protected  opennlp.model.MaxentModel model
          The model used to assign chunk tags to a sequence of tokens.
 
Constructor Summary
ChunkerME(ChunkerModel model)
          Initializes the current instance with the specified model.
ChunkerME(ChunkerModel model, int beamSize)
          Initializes the current instance with the specified model and the specified beam size.
ChunkerME(ChunkerModel model, int beamSize, SequenceValidator<String> sequenceValidator)
          Initializes the current instance with the specified model and the specified beam size.
ChunkerME(ChunkerModel model, int beamSize, SequenceValidator<String> sequenceValidator, ChunkerContextGenerator contextGenerator)
          Initializes the current instance with the specified model and the specified beam size.
ChunkerME(opennlp.model.MaxentModel mod)
          Deprecated. 
ChunkerME(opennlp.model.MaxentModel mod, ChunkerContextGenerator cg)
          Deprecated. 
ChunkerME(opennlp.model.MaxentModel mod, ChunkerContextGenerator cg, int beamSize)
          Deprecated. 
 
Method Summary
 List<String> chunk(List<String> toks, List<String> tags)
          Generates chunk tags for the given sequence returning the result in a list.
 String[] chunk(String[] toks, String[] tags)
          Generates chunk tags for the given sequence returning the result in an array.
static void main(String[] args)
          Deprecated. 
 double[] probs()
          Returns an array with the probabilities of the last decoded sequence.
 void probs(double[] probs)
          Populates the specified array with the probabilities of the last decoded sequence.
 Sequence[] topKSequences(List<String> sentence, List<String> tags)
          Returns the top k chunk sequences for the specified sentence with the specified pos-tags
 Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore)
          Returns the top k chunk sequences for the specified sentence with the specified pos-tags
static ChunkerModel train(String lang, ObjectStream<ChunkSample> in, int cutoff, int iterations)
          Trains a new model for the ChunkerME.
static ChunkerModel train(String lang, ObjectStream<ChunkSample> in, int cutoff, int iterations, ChunkerContextGenerator contextGenerator)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_BEAM_SIZE

public static final int DEFAULT_BEAM_SIZE
See Also:
Constant Field Values

beam

protected BeamSearch<String> beam
The beam used to search for sequences of chunk tag assignments.


model

protected opennlp.model.MaxentModel model
The model used to assign chunk tags to a sequence of tokens.

Constructor Detail

ChunkerME

public ChunkerME(ChunkerModel model,
                 int beamSize,
                 SequenceValidator<String> sequenceValidator,
                 ChunkerContextGenerator contextGenerator)
Initializes the current instance with the specified model and the specified beam size.

Parameters:
model - The model for this chunker.
cacheSize -
beamSize - The size of the beam that should be used when decoding sequences.
sequenceValidator - The SequenceValidator to determines whether the outcome is valid for the preceding sequence. This can be used to implement constraints on what sequences are valid.

ChunkerME

public ChunkerME(ChunkerModel model,
                 int beamSize,
                 SequenceValidator<String> sequenceValidator)
Initializes the current instance with the specified model and the specified beam size.

Parameters:
model - The model for this chunker.
beamSize - The size of the beam that should be used when decoding sequences.
sequenceValidator - The SequenceValidator to determines whether the outcome is valid for the preceding sequence. This can be used to implement constraints on what sequences are valid.

ChunkerME

public ChunkerME(ChunkerModel model,
                 int beamSize)
Initializes the current instance with the specified model and the specified beam size.

Parameters:
model - The model for this chunker.
cacheSize -
beamSize - The size of the beam that should be used when decoding sequences.

ChunkerME

public ChunkerME(ChunkerModel model)
Initializes the current instance with the specified model. The default beam size is used.

Parameters:
model -

ChunkerME

@Deprecated
public ChunkerME(opennlp.model.MaxentModel mod)
Deprecated. 

Creates a chunker using the specified model.

Parameters:
mod - The maximum entropy model for this chunker.

ChunkerME

@Deprecated
public ChunkerME(opennlp.model.MaxentModel mod,
                            ChunkerContextGenerator cg)
Deprecated. 

Creates a chunker using the specified model and context generator.

Parameters:
mod - The maximum entropy model for this chunker.
cg - The context generator to be used by the specified model.

ChunkerME

@Deprecated
public ChunkerME(opennlp.model.MaxentModel mod,
                            ChunkerContextGenerator cg,
                            int beamSize)
Deprecated. 

Creates a chunker using the specified model and context generator and decodes the model using a beam search of the specified size.

Parameters:
mod - The maximum entropy model for this chunker.
cg - The context generator to be used by the specified model.
beamSize - The size of the beam that should be used when decoding sequences.
Method Detail

chunk

public List<String> chunk(List<String> toks,
                          List<String> tags)
Description copied from interface: Chunker
Generates chunk tags for the given sequence returning the result in a list.

Specified by:
chunk in interface Chunker
Parameters:
toks - a list of the tokens or words of the sequence.
tags - a list of the pos tags of the sequence.
Returns:
a list of chunk tags for each token in the sequence.

chunk

public String[] chunk(String[] toks,
                      String[] tags)
Description copied from interface: Chunker
Generates chunk tags for the given sequence returning the result in an array.

Specified by:
chunk in interface Chunker
Parameters:
toks - an array of the tokens or words of the sequence.
tags - an array of the pos tags of the sequence.
Returns:
an array of chunk tags for each token in the sequence.

topKSequences

public Sequence[] topKSequences(List<String> sentence,
                                List<String> tags)
Description copied from interface: Chunker
Returns the top k chunk sequences for the specified sentence with the specified pos-tags

Specified by:
topKSequences in interface Chunker
Parameters:
sentence - The tokens of the sentence.
tags - The pos-tags for the specified sentence.
Returns:
the top k chunk sequences for the specified sentence.

topKSequences

public Sequence[] topKSequences(String[] sentence,
                                String[] tags,
                                double minSequenceScore)
Description copied from interface: Chunker
Returns the top k chunk sequences for the specified sentence with the specified pos-tags

Specified by:
topKSequences in interface Chunker
Parameters:
sentence - The tokens of the sentence.
tags - The pos-tags for the specified sentence.
Returns:
the top k chunk sequences for the specified sentence.

probs

public void probs(double[] probs)
Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk. The specified array should be at least as large as the numbe of tokens in the previous call to chunk.

Parameters:
probs - An array used to hold the probabilities of the last decoded sequence.

probs

public double[] probs()
Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk.

Returns:
An array with the same number of probabilities as tokens were sent to chunk when it was last called.

train

public static ChunkerModel train(String lang,
                                 ObjectStream<ChunkSample> in,
                                 int cutoff,
                                 int iterations,
                                 ChunkerContextGenerator contextGenerator)
                          throws IOException
Throws:
IOException

train

public static ChunkerModel train(String lang,
                                 ObjectStream<ChunkSample> in,
                                 int cutoff,
                                 int iterations)
                          throws IOException,
                                 ObjectStreamException
Trains a new model for the ChunkerME.

Parameters:
es -
iterations -
cutoff -
Returns:
the new model
Throws:
IOException
ObjectStreamException

main

@Deprecated
public static void main(String[] args)
                 throws IOException,
                        ObjectStreamException
Deprecated. 

Trains the chunker using the specified parameters.
Usage: ChunkerME trainingFile modelFile.
Training file should be one word per line where each line consists of a space-delimited triple of "word pos outcome". Sentence breaks are indicated by blank lines.

Parameters:
args - The training file and the model file.
Throws:
IOException - When the specified files can not be read.
ObjectStreamException


Copyright © 2010. All Rights Reserved.