public abstract class WiktionaryEntryParser extends Object implements IWiktionaryEntryParser
IWiktionaryEntry
and IWiktionarySense
instances. The parser is based on a finite state machine using a set
of block handlers that are being asked if they want to process the current
line of text. If so, the handler is in a position to process the subsequent
lines until the entire block has been processed and the next line is
subject to initialize a different block handler. Since there are large
differences between the individual Wiktionary language editions, there
should be one subclass of this parser for each language edition, which
cares about language-specific adaptation and the selection of the
block handlers used.Modifier and Type | Field and Description |
---|---|
protected static Pattern |
COMMENT_PATTERN |
protected long |
entryId |
protected List<IBlockHandler> |
handlers |
protected static Pattern |
IMAGE_PATTERN |
protected ILanguage |
language |
protected String |
redirectTemplate |
protected static Pattern |
REFERENCES_PATTERN |
Constructor and Description |
---|
WiktionaryEntryParser(ILanguage language,
String redirectName)
Instanciates the entry parser for the given language.
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
checkForRedirect(WiktionaryPage page,
String text)
Check if the specified text is a redirect and set the redirect target of
the given Wiktionary page.
|
protected abstract ParsingContext |
createParsingContext(WiktionaryPage page) |
ILanguage |
getLanguage()
Returns the language of this parser's Wiktionary edition.
|
protected abstract boolean |
isStartOfBlock(String line)
Hotspot for deciding if the given line is a potential start of a new
article constituent.
|
void |
parse(WiktionaryPage page,
String text)
Creates Wiktionary word entry instances from the provided text, and
adds them to the given article page.
|
protected void |
register(IBlockHandler handler)
Register the given handler that will be invoked during the parsing.
|
protected IBlockHandler |
selectHandler(String line)
Find a handler that is willing to handle the given line.
|
protected static final Pattern COMMENT_PATTERN
protected static final Pattern IMAGE_PATTERN
protected static final Pattern REFERENCES_PATTERN
protected ILanguage language
protected String redirectTemplate
protected long entryId
protected List<IBlockHandler> handlers
public void parse(WiktionaryPage page, String text)
IWiktionaryEntryParser
parse
in interface IWiktionaryEntryParser
protected abstract ParsingContext createParsingContext(WiktionaryPage page)
protected boolean checkForRedirect(WiktionaryPage page, String text)
protected abstract boolean isStartOfBlock(String line)
protected IBlockHandler selectHandler(String line)
protected void register(IBlockHandler handler)
public ILanguage getLanguage()
Copyright © 2011-2016 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.