opennlp.tools.tokenize
Class TokenSampleStream
java.lang.Object
opennlp.tools.util.FilterObjectStream<String,TokenSample>
opennlp.tools.tokenize.TokenSampleStream
- All Implemented Interfaces:
- ObjectStream<TokenSample>
public class TokenSampleStream
- extends FilterObjectStream<String,TokenSample>
This class is a stream filter which reads in string encoded samples and creates
TokenSample
s out of them. The input string sample is tokenized if a
whitespace or the special separator chars occur.
Sample:
"token1 token2 token3token4"
The tokens token1 and token2 are separated by a whitespace, token3 and token3
are separated by the special character sequence, in this case the default
split sequence.
The sequence must be unique in the input string and is not escaped.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TokenSampleStream
public TokenSampleStream(ObjectStream<String> sampleStrings,
String separatorChars)
TokenSampleStream
public TokenSampleStream(ObjectStream<String> sentences)
read
public TokenSample read()
throws IOException
- Description copied from interface:
ObjectStream
- Returns the next object. Calling this method repeatedly until it returns
null will return each object from the underlying source exactly once.
- Returns:
- the next object or null to signal that the stream is exhausted
- Throws:
IOException
Copyright © 2010. All Rights Reserved.