public abstract class XMLDumpParser extends Object implements IWiktionaryDumpParser
IWiktionaryDumpParser
for processing XML files
downloaded from http://download.wikimedia.org/backup-index.html. There
can be different specializations of this class that focus on a certain
aspect of the dump, e.g., parsing the full text on the article pages and
create an object structure from them, processing some aspects of
the user pages, filtering the article pages, etc. The base class should
be somewhat generic.Modifier and Type | Class and Description |
---|---|
protected class |
XMLDumpParser.XMLDumpHandler |
Modifier and Type | Field and Description |
---|---|
static String |
BZ2_FILE_EXTENSION
The file extension for bzip2 files that is used for the automatic
detection of the file format.
|
Constructor and Description |
---|
XMLDumpParser() |
Modifier and Type | Method and Description |
---|---|
protected abstract void |
onElementEnd(String name,
XMLDumpParser.XMLDumpHandler handler)
Hotspot that is invoked for each closing XML element.
|
protected abstract void |
onElementStart(String name,
XMLDumpParser.XMLDumpHandler handler)
Hotspot that is invoked for each opening XML element.
|
protected void |
onParserEnd()
Hotspot that is invoked on finishing the parsing.
|
protected void |
onParserStart()
Hotspot that is invoked on starting the parser.
|
void |
parse(File dumpFile)
Parses the given XML dump file.
|
protected void |
parseStream(InputStream in) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getPageParsers, register
public static final String BZ2_FILE_EXTENSION
public void parse(File dumpFile) throws WiktionaryException
parse
in interface IWiktionaryDumpParser
dumpFile
- the dumpFileWiktionaryException
- in case of any parser errors.protected void parseStream(InputStream in) throws IOException
IOException
protected void onParserStart()
protected abstract void onElementStart(String name, XMLDumpParser.XMLDumpHandler handler)
protected abstract void onElementEnd(String name, XMLDumpParser.XMLDumpHandler handler)
protected void onParserEnd()
Copyright © 2011-2016 Ubiquitous Knowledge Processing (UKP) Lab. All Rights Reserved.