|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
Detokenizer | A Detokenizer merges tokens back to their untokenized representation. |
TokenContextGenerator | Interface for TokenizerME context generators. |
Tokenizer | The interface for tokenizers, which segment a string into its tokens. |
Class Summary | |
---|---|
DefaultTokenContextGenerator | Generate events for maxent decisions for tokenization. |
DetokenizationDictionary | |
DictionaryDetokenizer | A rule based detokenizer. |
SimpleTokenizer | Performs tokenization using character classes. |
TokenizerCrossValidator | |
TokenizerEvaluator | The TokenizerEvaluator measures the performance of
the given Tokenizer with the provided reference
TokenSample s. |
TokenizerME | A Tokenizer for converting raw text into separated tokens. |
TokenizerModel | The TokenizerModel is the model used
by a learnable Tokenizer . |
TokenizerStream | The TokenizerStream uses a tokenizer to tokenize the
input string and output TokenSample s. |
TokenSample | A TokenSample is text with token spans. |
TokenSampleStream | This class is a stream filter which reads in string encoded samples and creates
TokenSample s out of them. |
TokSpanEventStream | This class reads the TokenSample s from the given Iterator
and converts the TokenSample s into Event s which
can be used by the maxent library for training. |
WhitespaceTokenizer | This tokenizer uses white spaces to tokenize the input text. |
WhitespaceTokenStream | This stream formats a TokenSample s into whitespace
separated token strings. |
Enum Summary | |
---|---|
DetokenizationDictionary.Operation | |
Detokenizer.DetokenizationOperation | This enum contains an operation for every token to merge the tokens together to their detokenized form. |
Contains classes related to finding token or words in a string. All
tokenizer implement the Tokenizer interface. Currently there is the
learnable TokenizerME
, the WhitespaceTokenizer
and
the SimpleTokenizer
which is a character class tokenizer.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |