AbstractBottomUpParser (OpenNLP Tools 1.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

opennlp.tools.parser
Class AbstractBottomUpParser

java.lang.Object
  opennlp.tools.parser.AbstractBottomUpParser

All Implemented Interfaces:: Parser

Direct Known Subclasses:: Parser, Parser

public abstract class AbstractBottomUpParser
extends Object
implements Parser
extends Object
implements Parser

Abstract class which contains code to tag and chunk parses for bottom up parsing and leaves implementation of advancing parses and completing parses to extend class.

Note:
The nodes within the returned parses are shared with other parses and therefore their parent node references will not be consistent with their child node reference. setParents can be used to make the parents consistent with a particular parse, but subsequent calls to setParents can invalidate the results of earlier calls.

Field Summary
`protected Chunker`	`chunker` The chunker that the parser uses to chunk non-recursive structures.
`static String`	`COMPLETE` Outcome used when a constituent is complete.
`protected Heap<Parse>`	`completeParses` Completed parses.
`static String`	`CONT` Prefix for outcomes continuing a constituent.
`protected boolean`	`createDerivationString` Specifies whether a derivation string should be created during parsing.
`protected boolean`	`debugOn` Turns debug print on or off.
`static double`	`defaultAdvancePercentage` The default amount of probability mass required of advanced outcomes.
`static int`	`defaultBeamSize` The default beam size used if no beam size is given.
`protected HeadRules`	`headRules` The head rules for the parser.
`static String`	`INC_NODE` The label for the top if an incomplete node.
`static String`	`INCOMPLETE` Outcome used when a constituent is incomplete.
`protected int`	`K` The maximum number of parses to advance from a single preceding parse.
`protected int`	`M` The maximum number of parses advanced from all preceding parses at each derivation step.
`protected Heap<Parse>`	`ndh` Incomplete parses which have been advanced.
`protected Heap<Parse>`	`odh` Incomplete parses which will be advanced.
`static String`	`OTHER` Outcome for token which is not contained in a basal constituent.
`protected Set<String>`	`punctSet` The set strings which are considered punctuation for the parser.
`protected double`	`Q` The minimum total probability mass of advanced outcomes.
`protected boolean`	`reportFailedParse` Specifies whether failed parses should be reported to standard error.
`static String`	`START` Prefix for outcomes starting a constituent.
`protected POSTagger`	`tagger` The pos-tagger that the parser uses.
`static String`	`TOK_NODE` The label for a token node.
`static String`	`TOP_NODE` The label for the top node.
`static Integer`	`ZERO` The integer 0.

Constructor Summary
`AbstractBottomUpParser(POSTagger tagger, Chunker chunker, HeadRules headRules, int beamSize, double advancePercentage)`

Method Summary
`protected Parse[]`	`advanceChunks(Parse p, double minChunkScore)` Returns the top chunk sequences for the specified parse.
`protected abstract Parse[]`	`advanceParses(Parse p, double probMass)` Advances the specified parse and returns the an array advanced parses whose probability accounts for more than the specified amount of probability mass.
`protected Parse[]`	`advanceTags(Parse p)` Advances the parse by assigning it POS tags and returns multiple tag sequences.
`protected abstract void`	`advanceTop(Parse p)` Adds the "TOP" node to the specified parse.
`static Dictionary`	`buildDictionary(ObjectStream<Parse> data, HeadRules rules, int cutoff)` Creates a n-gram dictionary from the specified data stream using the specified head rule and specified cut-off.
`static Parse[]`	`collapsePunctuation(Parse[] chunks, Set<String> punctSet)` Removes the punctuation from the specified set of chunks, adds it to the parses adjacent to the punctuation is specified, and returns a new array of parses with the punctuation removed.
`protected int`	`mapParseIndex(int index, Parse[] nonPunctParses, Parse[] parses)` Determines the mapping between the specified index into the specified parses without punctuation to the corresponding index into the specified parses.
`Parse`	`parse(Parse tokens)` Returns a parse for the specified parse of tokens.
`Parse[]`	`parse(Parse tokens, int numParses)` Returns the specified number of parses or fewer for the specified tokens.
`void`	`setErrorReporting(boolean errorReporting)` Specifies whether the parser should report when it was unable to find a parse for a particular sentence.
`static void`	`setParents(Parse p)` Assigns parent references for the specified parse so that they are consistent with the children references.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

M

protected int M

The maximum number of parses advanced from all preceding parses at each derivation step.

K

protected int K

The maximum number of parses to advance from a single preceding parse.

Q

protected double Q

The minimum total probability mass of advanced outcomes.

defaultBeamSize

public static final int defaultBeamSize

The default beam size used if no beam size is given.

See Also:: Constant Field Values

defaultAdvancePercentage

public static final double defaultAdvancePercentage

The default amount of probability mass required of advanced outcomes.

See Also:: Constant Field Values

completeParses

protected Heap<Parse> completeParses

Completed parses.

odh

protected Heap<Parse> odh

Incomplete parses which will be advanced.

ndh

protected Heap<Parse> ndh

Incomplete parses which have been advanced.

headRules

protected HeadRules headRules

The head rules for the parser.

punctSet

protected Set<String> punctSet

The set strings which are considered punctuation for the parser. Punctuation is not attached, but floats to the top of the parse as attachment decisions are made about its non-punctuation sister nodes.

TOP_NODE

public static final String TOP_NODE

The label for the top node.

See Also:: Constant Field Values

INC_NODE

public static final String INC_NODE

The label for the top if an incomplete node.

See Also:: Constant Field Values

TOK_NODE

public static final String TOK_NODE

The label for a token node.

See Also:: Constant Field Values

ZERO

public static final Integer ZERO

The integer 0.

START

public static final String START

Prefix for outcomes starting a constituent.

See Also:: Constant Field Values

CONT

public static final String CONT

Prefix for outcomes continuing a constituent.

See Also:: Constant Field Values

OTHER

public static final String OTHER

Outcome for token which is not contained in a basal constituent.

See Also:: Constant Field Values

COMPLETE

public static final String COMPLETE

Outcome used when a constituent is complete.

See Also:: Constant Field Values

INCOMPLETE

public static final String INCOMPLETE

Outcome used when a constituent is incomplete.

See Also:: Constant Field Values

tagger

protected POSTagger tagger

The pos-tagger that the parser uses.

chunker

protected Chunker chunker

The chunker that the parser uses to chunk non-recursive structures.

reportFailedParse

protected boolean reportFailedParse

Specifies whether failed parses should be reported to standard error.

createDerivationString

protected boolean createDerivationString

Specifies whether a derivation string should be created during parsing. This is useful for debugging.

debugOn

protected boolean debugOn

Turns debug print on or off.

Constructor Detail

AbstractBottomUpParser

public AbstractBottomUpParser(POSTagger tagger,
                              Chunker chunker,
                              HeadRules headRules,
                              int beamSize,
                              double advancePercentage)

Method Detail

setErrorReporting

public void setErrorReporting(boolean errorReporting)

Specifies whether the parser should report when it was unable to find a parse for a particular sentence.

Parameters:: errorReporting - If true then un-parsed sentences are reported, false otherwise.

setParents

public static void setParents(Parse p)

Assigns parent references for the specified parse so that they are consistent with the children references.

Parameters:: p - The parse whose parent references need to be assigned.

collapsePunctuation

public static Parse[] collapsePunctuation(Parse[] chunks,
                                          Set<String> punctSet)

Removes the punctuation from the specified set of chunks, adds it to the parses adjacent to the punctuation is specified, and returns a new array of parses with the punctuation removed.

Parameters:: chunks - A set of parses.; punctSet - The set of punctuation which is to be removed.
Returns:: An array of parses which is a subset of chunks with punctuation removed.

advanceParses

protected abstract Parse[] advanceParses(Parse p,
                                         double probMass)

Advances the specified parse and returns the an array advanced parses whose probability accounts for more than the specified amount of probability mass.

Parameters:: p - The parse to advance.; probMass - The amount of probability mass that should be accounted for by the advanced parses.

advanceTop

protected abstract void advanceTop(Parse p)

Adds the "TOP" node to the specified parse.

Parameters:: p - The complete parse.

parse

public Parse[] parse(Parse tokens,
                     int numParses)

Description copied from interface: Parser

Returns the specified number of parses or fewer for the specified tokens.
Note: The nodes within the returned parses are shared with other parses and therefore their parent node references will not be consistent with their child node reference. setParents can be used to make the parents consistent with a particular parse, but subsequent calls to setParents can invalidate the results of earlier calls.

Specified by:: parse in interface Parser

Parameters:: tokens - A parse containing the tokens with a single parent node.; numParses - The number of parses desired.
Returns:: the specified number of parses for the specified tokens.

parse

public Parse parse(Parse tokens)

Description copied from interface: Parser

Returns a parse for the specified parse of tokens.

Specified by:: parse in interface Parser

Parameters:: tokens - The root node of a flat parse containing only tokens.
Returns:: A full parse of the specified tokens or the flat chunks of the tokens if a fullparse could not be found.

advanceChunks

protected Parse[] advanceChunks(Parse p,
                                double minChunkScore)

Returns the top chunk sequences for the specified parse.

Parameters:: p - A pos-tag assigned parse.; minChunkScore - A minimum score below which chunks should not be advanced.
Returns:: The top chunk assignments to the specified parse.

advanceTags

protected Parse[] advanceTags(Parse p)

Advances the parse by assigning it POS tags and returns multiple tag sequences.

Parameters:: p - The parse to be tagged.
Returns:: Parses with different POS-tag sequence assignments.

mapParseIndex

protected int mapParseIndex(int index,
                            Parse[] nonPunctParses,
                            Parse[] parses)

Determines the mapping between the specified index into the specified parses without punctuation to the corresponding index into the specified parses.

Parameters:: index - An index into the parses without punctuation.; nonPunctParses - The parses without punctuation.; parses - The parses wit punctuation.
Returns:: An index into the specified parses which corresponds to the same node the specified index into the parses with punctuation.

buildDictionary

public static Dictionary buildDictionary(ObjectStream<Parse> data,
                                         HeadRules rules,
                                         int cutoff)
                                  throws IOException

Creates a n-gram dictionary from the specified data stream using the specified head rule and specified cut-off.

Parameters:: data - The data stream of parses.; rules - The head rules for the parses.; cutoff - The minimum number of entries required for the n-gram to be saved as part of the dictionary.
Returns:: A dictionary object.
Throws:: IOException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

opennlp.tools.parser Class AbstractBottomUpParser

M

K

Q

defaultBeamSize

defaultAdvancePercentage

completeParses

odh

ndh

headRules

punctSet

TOP_NODE

INC_NODE

TOK_NODE

ZERO

START

CONT

OTHER

COMPLETE

INCOMPLETE

tagger

chunker

reportFailedParse

createDerivationString

debugOn

AbstractBottomUpParser

setErrorReporting

setParents

collapsePunctuation

advanceParses

advanceTop

parse

parse

advanceChunks

advanceTags

mapParseIndex

buildDictionary

opennlp.tools.parser
Class AbstractBottomUpParser