|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object opennlp.tools.sentdetect.SentenceDetectorME
public class SentenceDetectorME
A sentence detector for splitting up raw text into sentences.
A maximum entropy model is used to evaluate the characters ".", "!", and "?" in a string to determine if they signify the end of a sentence.
Field Summary | |
---|---|
static String |
NO_SPLIT
Constant indicates no sentence split. |
static String |
SPLIT
Constant indicates a sentence split. |
protected boolean |
useTokenEnd
|
Constructor Summary | |
---|---|
SentenceDetectorME(SentenceModel model)
Initializes the current instance. |
|
SentenceDetectorME(SentenceModel model,
Factory factory)
|
Method Summary | |
---|---|
double[] |
getSentenceProbabilities()
Returns the probabilities associated with the most recent calls to sentDetect(). |
protected boolean |
isAcceptableBreak(String s,
int fromIndex,
int candidateIndex)
Allows subclasses to check an overzealous (read: poorly trained) model from flagging obvious non-breaks as breaks based on some boolean determination of a break's acceptability. |
static void |
main(String[] args)
Trains a new sentence detection model. |
String[] |
sentDetect(String s)
Detect sentences in a String. |
Span[] |
sentPosDetect(String s)
Detect the position of the first words of sentences in a String. |
static SentenceModel |
train(String languageCode,
ObjectStream<SentenceSample> samples,
boolean useTokenEnd,
Dictionary abbreviations)
|
static SentenceModel |
train(String languageCode,
ObjectStream<SentenceSample> samples,
boolean useTokenEnd,
Dictionary abbreviations,
int cutoff,
int iterations)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String SPLIT
public static final String NO_SPLIT
protected boolean useTokenEnd
Constructor Detail |
---|
public SentenceDetectorME(SentenceModel model)
model
- the SentenceModel
public SentenceDetectorME(SentenceModel model, Factory factory)
Method Detail |
---|
public String[] sentDetect(String s)
sentDetect
in interface SentenceDetector
s
- The string to be processed.
public Span[] sentPosDetect(String s)
sentPosDetect
in interface SentenceDetector
s
- The string to be processed.
public double[] getSentenceProbabilities()
protected boolean isAcceptableBreak(String s, int fromIndex, int candidateIndex)
The implementation here always returns true, which means that the MaxentModel's outcome is taken as is.
s
- the string in which the break occurred.fromIndex
- the start of the segment currently being evaluatedcandidateIndex
- the index of the candidate sentence ending
public static SentenceModel train(String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations) throws IOException
IOException
public static SentenceModel train(String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations, int cutoff, int iterations) throws IOException
IOException
public static void main(String[] args) throws IOException
Trains a new sentence detection model.
Usage: opennlp.tools.sentdetect.SentenceDetectorME data_file new_model_name (iterations cutoff)?
args
-
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |