public class XmlXPathReader extends FileSetCollectionReaderBase
This is currently optimized for TREC format, which means the style topics are presented in. You should provide the parameter XPath expression that of the parent node And the child nodes of each parent node will be stored separately in its own CAS.
If your expression evaluates to leaf nodes, empty CASes will be created.
Modifier and Type | Class and Description |
---|---|
static class |
XmlXPathReader.XmlNodes |
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_DOC_ID_TAG
Tag which contains the docId.
|
static String |
PARAM_EXCLUDE_TAGS
Tags which should be ignored.
|
static String |
PARAM_INCLUDE_TAGS
Tags which should be worked on.
|
static String |
PARAM_LANGUAGE
Language of the documents.
|
static String |
PARAM_SUBSTITUTE_TAGS
Specify to substitute tag names in CAS.
|
static String |
PARAM_XPATH_EXPRESSION
Specifies the XPath expression to all nodes to be processed.
|
EXCLUDE_PREFIX, INCLUDE_PREFIX, PARAM_CASE_SENSITIVE, PARAM_PATH, PARAM_PATTERNS, PARAM_SOURCE_LOCATION, PARAM_USE_DEFAULT_EXCLUDES
Constructor and Description |
---|
XmlXPathReader() |
Modifier and Type | Method and Description |
---|---|
void |
getNext(org.apache.uima.cas.CAS cas) |
boolean |
hasNext()
Check whether there is still nodes to be processed.
|
void |
initialize(org.apache.uima.UimaContext arg0) |
getFileSetIterator, getIncludedFilesCount, getLanguage, getProgress, initCas, nextFile
close, getLogger, initialize
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
getCasManager, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public static final String PARAM_XPATH_EXPRESSION
public static final String PARAM_INCLUDE_TAGS
If this and PARAM_EXCLUDE_TAGS are both provided, tags in set PARAM_INCLUDE_TAGS - PARAM_EXCLUDE_TAGS will be processed.
public static final String PARAM_EXCLUDE_TAGS
If this and PARAM_INCLUDE_TAGS are both provided, tags in set PARAM_INCLUDE_TAGS - PARAM_EXCLUDE_TAGS will be processed.
public static final String PARAM_LANGUAGE
public static final String PARAM_SUBSTITUTE_TAGS
Please give the substitutions each in before - after order. For example to substitute "foo" with "bar", and "hey" with "ho", you can provide { "foo", "bar", "hey", "ho" }.
public static final String PARAM_DOC_ID_TAG
public void initialize(org.apache.uima.UimaContext arg0) throws org.apache.uima.resource.ResourceInitializationException
initialize
in class FileSetCollectionReaderBase
org.apache.uima.resource.ResourceInitializationException
public boolean hasNext() throws IOException, org.apache.uima.collection.CollectionException
After all nodes from current file get processed, read in nodes from the next file
hasNext
in interface org.apache.uima.collection.base_cpm.BaseCollectionReader
hasNext
in class FileSetCollectionReaderBase
IOException
org.apache.uima.collection.CollectionException
public void getNext(org.apache.uima.cas.CAS cas) throws IOException
IOException
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.