SentenceDetectorME (OpenNLP Tools 1.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES All Classes

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

opennlp.tools.sentdetect
Class SentenceDetectorME

java.lang.Object
  opennlp.tools.sentdetect.SentenceDetectorME

All Implemented Interfaces:: SentenceDetector

public class SentenceDetectorME
extends Object
implements SentenceDetector
extends Object
implements SentenceDetector

A sentence detector for splitting up raw text into sentences.

A maximum entropy model is used to evaluate the characters ".", "!", and "?" in a string to determine if they signify the end of a sentence.

Field Summary
`static String`	`NO_SPLIT` Constant indicates no sentence split.
`static String`	`SPLIT` Constant indicates a sentence split.
`protected boolean`	`useTokenEnd`

Constructor Summary
`SentenceDetectorME(SentenceModel model)` Initializes the current instance.
`SentenceDetectorME(SentenceModel model, Factory factory)`

Method Summary
`double[]`	`getSentenceProbabilities()` Returns the probabilities associated with the most recent calls to sentDetect().
`protected boolean`	`isAcceptableBreak(String s, int fromIndex, int candidateIndex)` Allows subclasses to check an overzealous (read: poorly trained) model from flagging obvious non-breaks as breaks based on some boolean determination of a break's acceptability.
`static void`	`main(String[] args)` Trains a new sentence detection model.
`String[]`	`sentDetect(String s)` Detect sentences in a String.
`Span[]`	`sentPosDetect(String s)` Detect the position of the first words of sentences in a String.
`static SentenceModel`	`train(String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations)`
`static SentenceModel`	`train(String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations, int cutoff, int iterations)`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

SPLIT

public static final String SPLIT

Constant indicates a sentence split.

See Also:: Constant Field Values

NO_SPLIT

public static final String NO_SPLIT

Constant indicates no sentence split.

See Also:: Constant Field Values

useTokenEnd

protected boolean useTokenEnd

Constructor Detail

SentenceDetectorME

public SentenceDetectorME(SentenceModel model)

Initializes the current instance.

Parameters:: model - the SentenceModel

SentenceDetectorME

public SentenceDetectorME(SentenceModel model,
                          Factory factory)

Method Detail

sentDetect

public String[] sentDetect(String s)

Detect sentences in a String.

Specified by:: sentDetect in interface SentenceDetector

Parameters:: s - The string to be processed.
Returns:: A string array containing individual sentences as elements.

sentPosDetect

public Span[] sentPosDetect(String s)

Detect the position of the first words of sentences in a String.

Specified by:: sentPosDetect in interface SentenceDetector

Parameters:: s - The string to be processed.
Returns:: A integer array containing the positions of the end index of every sentence

getSentenceProbabilities

public double[] getSentenceProbabilities()

Returns the probabilities associated with the most recent calls to sentDetect().

Returns:: probability for each sentence returned for the most recent call to sentDetect. If not applicable an empty array is returned.

isAcceptableBreak

protected boolean isAcceptableBreak(String s,
                                    int fromIndex,
                                    int candidateIndex)

Allows subclasses to check an overzealous (read: poorly trained) model from flagging obvious non-breaks as breaks based on some boolean determination of a break's acceptability.

The implementation here always returns true, which means that the MaxentModel's outcome is taken as is.

Parameters:: s - the string in which the break occurred.; fromIndex - the start of the segment currently being evaluated; candidateIndex - the index of the candidate sentence ending
Returns:: true if the break is acceptable

train

public static SentenceModel train(String languageCode,
                                  ObjectStream<SentenceSample> samples,
                                  boolean useTokenEnd,
                                  Dictionary abbreviations)
                           throws IOException

Throws:: IOException

train

public static SentenceModel train(String languageCode,
                                  ObjectStream<SentenceSample> samples,
                                  boolean useTokenEnd,
                                  Dictionary abbreviations,
                                  int cutoff,
                                  int iterations)
                           throws IOException

Throws:: IOException

main

public static void main(String[] args)
                 throws IOException

Trains a new sentence detection model.

Usage: opennlp.tools.sentdetect.SentenceDetectorME data_file new_model_name (iterations cutoff)?

Parameters:: args -
Throws:: IOException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES All Classes

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

opennlp.tools.sentdetect Class SentenceDetectorME

SPLIT

NO_SPLIT

useTokenEnd

SentenceDetectorME

SentenceDetectorME

sentDetect

sentPosDetect

getSentenceProbabilities

isAcceptableBreak

train

train

main

opennlp.tools.sentdetect
Class SentenceDetectorME