opennlp.tools.doccat
Class DocumentCategorizerME

java.lang.Object
  extended by opennlp.tools.doccat.DocumentCategorizerME
All Implemented Interfaces:
DocumentCategorizer

public class DocumentCategorizerME
extends Object
implements DocumentCategorizer

Maxent implementation of DocumentCategorizer.


Constructor Summary
DocumentCategorizerME(DoccatModel model)
          Initializes the current instance with a doccat model.
DocumentCategorizerME(DoccatModel model, FeatureGenerator... featureGenerators)
          Initializes a the current instance with a doccat model and custom feature generation.
DocumentCategorizerME(opennlp.model.MaxentModel model)
          Deprecated. Use DocumentCategorizerME(DoccatModel) instead.
DocumentCategorizerME(opennlp.model.MaxentModel model, FeatureGenerator... featureGenerators)
          Deprecated. Use DocumentCategorizerME(DoccatModel, FeatureGenerator...) instead.
 
Method Summary
 double[] categorize(String documentText)
           
 double[] categorize(String[] text)
          Categorizes the given text.
 String getAllResults(double[] results)
           
 String getBestCategory(double[] outcome)
           
 String getCategory(int index)
           
 int getIndex(String category)
           
 int getNumberOfCategories()
           
static opennlp.model.AbstractModel train(DocumentCategorizerEventStream eventStream)
          Deprecated. 
static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples)
          Trains a doccat model with default feature generation.
static DoccatModel train(String languageCode, ObjectStream<DocumentSample> samples, int cutoff, int iterations, FeatureGenerator... featureGenerators)
          Trains a document categorizer model with custom feature generation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocumentCategorizerME

public DocumentCategorizerME(DoccatModel model,
                             FeatureGenerator... featureGenerators)
Initializes a the current instance with a doccat model and custom feature generation. The feature generation must be identical to the configuration at training time.

Parameters:
model -
featureGenerators -

DocumentCategorizerME

public DocumentCategorizerME(DoccatModel model)
Initializes the current instance with a doccat model. Default feature generation is used.

Parameters:
model -

DocumentCategorizerME

@Deprecated
public DocumentCategorizerME(opennlp.model.MaxentModel model)
Deprecated. Use DocumentCategorizerME(DoccatModel) instead.

Initializes the current instance with the given MaxentModel.

Parameters:
model -

DocumentCategorizerME

@Deprecated
public DocumentCategorizerME(opennlp.model.MaxentModel model,
                                        FeatureGenerator... featureGenerators)
Deprecated. Use DocumentCategorizerME(DoccatModel, FeatureGenerator...) instead.

Initializes the current instance with a the given MaxentModel and FeatureGenerators.

Parameters:
model -
featureGenerators -
Method Detail

categorize

public double[] categorize(String[] text)
Categorizes the given text.

Specified by:
categorize in interface DocumentCategorizer
Parameters:
text -

categorize

public double[] categorize(String documentText)
Specified by:
categorize in interface DocumentCategorizer

getBestCategory

public String getBestCategory(double[] outcome)
Specified by:
getBestCategory in interface DocumentCategorizer

getIndex

public int getIndex(String category)
Specified by:
getIndex in interface DocumentCategorizer

getCategory

public String getCategory(int index)
Specified by:
getCategory in interface DocumentCategorizer

getNumberOfCategories

public int getNumberOfCategories()
Specified by:
getNumberOfCategories in interface DocumentCategorizer

getAllResults

public String getAllResults(double[] results)
Specified by:
getAllResults in interface DocumentCategorizer

train

@Deprecated
public static opennlp.model.AbstractModel train(DocumentCategorizerEventStream eventStream)
                                         throws IOException
Deprecated. 

Trains a new model for the DocumentCategorizerME.

Parameters:
eventStream -
Returns:
the new model
Throws:
IOException

train

public static DoccatModel train(String languageCode,
                                ObjectStream<DocumentSample> samples,
                                int cutoff,
                                int iterations,
                                FeatureGenerator... featureGenerators)
                         throws IOException
Trains a document categorizer model with custom feature generation.

Parameters:
languageCode -
samples -
cutoff -
iterations -
featureGenerators -
Returns:
Throws:
IOException

train

public static DoccatModel train(String languageCode,
                                ObjectStream<DocumentSample> samples)
                         throws IOException
Trains a doccat model with default feature generation.

Parameters:
languageCode -
samples -
Returns:
Throws:
IOException
ObjectStreamException


Copyright © 2010. All Rights Reserved.