|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object opennlp.tools.parser.AbstractBottomUpParser
public abstract class AbstractBottomUpParser
Abstract class which contains code to tag and chunk parses for bottom up parsing and leaves implementation of advancing parses and completing parses to extend class.
Note:
The nodes within
the returned parses are shared with other parses and therefore their parent node references will not be consistent
with their child node reference. setParents
can be used to make the parents consistent
with a particular parse, but subsequent calls to setParents
can invalidate the results of earlier
calls.
Field Summary | |
---|---|
protected Chunker |
chunker
The chunker that the parser uses to chunk non-recursive structures. |
static String |
COMPLETE
Outcome used when a constituent is complete. |
protected Heap<Parse> |
completeParses
Completed parses. |
static String |
CONT
Prefix for outcomes continuing a constituent. |
protected boolean |
createDerivationString
Specifies whether a derivation string should be created during parsing. |
protected boolean |
debugOn
Turns debug print on or off. |
static double |
defaultAdvancePercentage
The default amount of probability mass required of advanced outcomes. |
static int |
defaultBeamSize
The default beam size used if no beam size is given. |
protected HeadRules |
headRules
The head rules for the parser. |
static String |
INC_NODE
The label for the top if an incomplete node. |
static String |
INCOMPLETE
Outcome used when a constituent is incomplete. |
protected int |
K
The maximum number of parses to advance from a single preceding parse. |
protected int |
M
The maximum number of parses advanced from all preceding parses at each derivation step. |
protected Heap<Parse> |
ndh
Incomplete parses which have been advanced. |
protected Heap<Parse> |
odh
Incomplete parses which will be advanced. |
static String |
OTHER
Outcome for token which is not contained in a basal constituent. |
protected Set<String> |
punctSet
The set strings which are considered punctuation for the parser. |
protected double |
Q
The minimum total probability mass of advanced outcomes. |
protected boolean |
reportFailedParse
Specifies whether failed parses should be reported to standard error. |
static String |
START
Prefix for outcomes starting a constituent. |
protected POSTagger |
tagger
The pos-tagger that the parser uses. |
static String |
TOK_NODE
The label for a token node. |
static String |
TOP_NODE
The label for the top node. |
static Integer |
ZERO
The integer 0. |
Constructor Summary | |
---|---|
AbstractBottomUpParser(POSTagger tagger,
Chunker chunker,
HeadRules headRules,
int beamSize,
double advancePercentage)
|
Method Summary | |
---|---|
protected Parse[] |
advanceChunks(Parse p,
double minChunkScore)
Returns the top chunk sequences for the specified parse. |
protected abstract Parse[] |
advanceParses(Parse p,
double probMass)
Advances the specified parse and returns the an array advanced parses whose probability accounts for more than the specified amount of probability mass. |
protected Parse[] |
advanceTags(Parse p)
Advances the parse by assigning it POS tags and returns multiple tag sequences. |
protected abstract void |
advanceTop(Parse p)
Adds the "TOP" node to the specified parse. |
static Dictionary |
buildDictionary(ObjectStream<Parse> data,
HeadRules rules,
int cutoff)
Creates a n-gram dictionary from the specified data stream using the specified head rule and specified cut-off. |
static Parse[] |
collapsePunctuation(Parse[] chunks,
Set<String> punctSet)
Removes the punctuation from the specified set of chunks, adds it to the parses adjacent to the punctuation is specified, and returns a new array of parses with the punctuation removed. |
protected int |
mapParseIndex(int index,
Parse[] nonPunctParses,
Parse[] parses)
Determines the mapping between the specified index into the specified parses without punctuation to the corresponding index into the specified parses. |
Parse |
parse(Parse tokens)
Returns a parse for the specified parse of tokens. |
Parse[] |
parse(Parse tokens,
int numParses)
Returns the specified number of parses or fewer for the specified tokens. |
void |
setErrorReporting(boolean errorReporting)
Specifies whether the parser should report when it was unable to find a parse for a particular sentence. |
static void |
setParents(Parse p)
Assigns parent references for the specified parse so that they are consistent with the children references. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected int M
protected int K
protected double Q
public static final int defaultBeamSize
public static final double defaultAdvancePercentage
protected Heap<Parse> completeParses
protected Heap<Parse> odh
protected Heap<Parse> ndh
protected HeadRules headRules
protected Set<String> punctSet
public static final String TOP_NODE
public static final String INC_NODE
public static final String TOK_NODE
public static final Integer ZERO
public static final String START
public static final String CONT
public static final String OTHER
public static final String COMPLETE
public static final String INCOMPLETE
protected POSTagger tagger
protected Chunker chunker
protected boolean reportFailedParse
protected boolean createDerivationString
protected boolean debugOn
Constructor Detail |
---|
public AbstractBottomUpParser(POSTagger tagger, Chunker chunker, HeadRules headRules, int beamSize, double advancePercentage)
Method Detail |
---|
public void setErrorReporting(boolean errorReporting)
errorReporting
- If true then un-parsed sentences are reported, false otherwise.public static void setParents(Parse p)
p
- The parse whose parent references need to be assigned.public static Parse[] collapsePunctuation(Parse[] chunks, Set<String> punctSet)
chunks
- A set of parses.punctSet
- The set of punctuation which is to be removed.
protected abstract Parse[] advanceParses(Parse p, double probMass)
p
- The parse to advance.probMass
- The amount of probability mass that should be accounted for by the advanced parses.protected abstract void advanceTop(Parse p)
p
- The complete parse.public Parse[] parse(Parse tokens, int numParses)
Parser
setParents
can be used to make the parents consistent
with a particular parse, but subsequent calls to setParents
can invalidate the results of earlier
calls.
parse
in interface Parser
tokens
- A parse containing the tokens with a single parent node.numParses
- The number of parses desired.
public Parse parse(Parse tokens)
Parser
parse
in interface Parser
tokens
- The root node of a flat parse containing only tokens.
protected Parse[] advanceChunks(Parse p, double minChunkScore)
p
- A pos-tag assigned parse.minChunkScore
- A minimum score below which chunks should not be advanced.
protected Parse[] advanceTags(Parse p)
p
- The parse to be tagged.
protected int mapParseIndex(int index, Parse[] nonPunctParses, Parse[] parses)
index
- An index into the parses without punctuation.nonPunctParses
- The parses without punctuation.parses
- The parses wit punctuation.
public static Dictionary buildDictionary(ObjectStream<Parse> data, HeadRules rules, int cutoff) throws IOException
data
- The data stream of parses.rules
- The head rules for the parses.cutoff
- The minimum number of entries required for the n-gram to be saved as part of the dictionary.
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |