opennlp.tools.ngram
Class NGramModel

java.lang.Object
  extended by opennlp.tools.ngram.NGramModel
All Implemented Interfaces:
Iterable<StringList>

public class NGramModel
extends Object
implements Iterable<StringList>

The NGramModel can be used to crate ngrams and character ngrams.

See Also:
StringList

Field Summary
protected static String COUNT
           
 
Constructor Summary
NGramModel()
          Initializes an empty instance.
NGramModel(InputStream in)
          Initializes the current instance.
 
Method Summary
 void add(String chars, int minLength, int maxLength)
          Adds character NGrams to the current instance.
 void add(StringList ngram)
          Adds one NGram, if it already exists the count increase by one.
 void add(StringList ngram, int minLength, int maxLength)
          Adds NGrams up to the specified length to the current instance.
 boolean contains(StringList tokens)
          Checks fit he given tokens are contained by the current instance.
 void cutoff(int cutoffUnder, int cutoffOver)
          Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.
 boolean equals(Object obj)
           
 int getCount(StringList ngram)
          Retrieves the count of the given ngram.
 int hashCode()
           
 Iterator<StringList> iterator()
          Retrieves an Iterator over all StringList entries.
 int numberOfGrams()
          Retrieves the total count of all Ngrams.
 void remove(StringList tokens)
          Removes the specified tokens form the NGram model, they are just dropped.
 void serialize(OutputStream out)
          Writes the ngram instance to the given OutputStream.
 void setCount(StringList ngram, int count)
          Sets the count of an existing ngram.
 int size()
          Retrieves the number of StringList entries in the current instance.
 Dictionary toDictionary()
          Creates a dictionary which contain all StringList which are in the current NGramModel.
 Dictionary toDictionary(boolean caseSensitive)
          Creates a dictionary which contains all StringLists which are in the current NGramModel.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

COUNT

protected static final String COUNT
See Also:
Constant Field Values
Constructor Detail

NGramModel

public NGramModel()
Initializes an empty instance.


NGramModel

public NGramModel(InputStream in)
           throws IOException,
                  InvalidFormatException
Initializes the current instance.

Parameters:
in -
Throws:
IOException
InvalidFormatException
Method Detail

getCount

public int getCount(StringList ngram)
Retrieves the count of the given ngram.

Parameters:
ngram -
Returns:
count of the ngram or 0 if it is not contained

setCount

public void setCount(StringList ngram,
                     int count)
Sets the count of an existing ngram.

Parameters:
ngram -
count -

add

public void add(StringList ngram)
Adds one NGram, if it already exists the count increase by one.

Parameters:
ngram -

add

public void add(StringList ngram,
                int minLength,
                int maxLength)
Adds NGrams up to the specified length to the current instance.

Parameters:
ngram - the tokens to build the uni-grams, bi-grams, tri-grams, .. from.
minLength - - minimal length
maxLength - - maximal length

add

public void add(String chars,
                int minLength,
                int maxLength)
Adds character NGrams to the current instance.

Parameters:
chars -
minLength -
maxLength -

remove

public void remove(StringList tokens)
Removes the specified tokens form the NGram model, they are just dropped.

Parameters:
tokens -

contains

public boolean contains(StringList tokens)
Checks fit he given tokens are contained by the current instance.

Parameters:
tokens -
Returns:
true if the ngram is contained

size

public int size()
Retrieves the number of StringList entries in the current instance.

Returns:
number of different grams

iterator

public Iterator<StringList> iterator()
Retrieves an Iterator over all StringList entries.

Specified by:
iterator in interface Iterable<StringList>
Returns:
iterator over all grams

numberOfGrams

public int numberOfGrams()
Retrieves the total count of all Ngrams.

Returns:
total count of all ngrams

cutoff

public void cutoff(int cutoffUnder,
                   int cutoffOver)
Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.

Parameters:
cutoffUnder -
cutoffOver -

toDictionary

public Dictionary toDictionary()
Creates a dictionary which contain all StringList which are in the current NGramModel. Entries which are only different in the case are merged into one. Calling this method is the same as calling #toDictionary(true).

Returns:
a dictionary of the ngrams

toDictionary

public Dictionary toDictionary(boolean caseSensitive)
Creates a dictionary which contains all StringLists which are in the current NGramModel.

Parameters:
caseSensitive - Specifies whether case distinctions should be kept in the creation of the dictionary.
Returns:
a dictionary of the ngrams

serialize

public void serialize(OutputStream out)
               throws IOException
Writes the ngram instance to the given OutputStream.

Parameters:
out -
Throws:
IOException - if an I/O Error during writing occurs

equals

public boolean equals(Object obj)
Overrides:
equals in class Object

toString

public String toString()
Overrides:
toString in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object


Copyright © 2010. All Rights Reserved.