public class AclAnthologyReader extends ResourceCollectionReaderBase
Reads the ACL anthology corpus and outputs CASes with plain text documents.
The reader tries to strip out hyphenation and replace problematic characters to produce a cleaned text. Otherwise, it is a plain text reader.
ResourceCollectionReaderBase.Resource
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_SOURCE_ENCODING
Name of configuration parameter that contains the character encoding used by the input files.
|
EXCLUDE_PREFIX, INCLUDE_PREFIX, JAR_PREFIX, KEY_RESOURCE_RESOLVER, PARAM_INCLUDE_HIDDEN, PARAM_LANGUAGE, PARAM_LOG_FREQ, PARAM_PATH, PARAM_PATTERNS, PARAM_SOURCE_LOCATION, PARAM_USE_DEFAULT_EXCLUDES
Constructor and Description |
---|
AclAnthologyReader() |
Modifier and Type | Method and Description |
---|---|
void |
getNext(org.apache.uima.cas.CAS aCAS) |
getBase, getBase, getDefaultExcludes, getLanguage, getProgress, getResolver, getResourceIterator, getResources, getSourceLocation, hasNext, initCas, initCas, initialize, isSingleLocation, locationToUrl, nextFile, scan
close, getLogger, initialize
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
getCasManager, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public static final String PARAM_SOURCE_ENCODING
public void getNext(org.apache.uima.cas.CAS aCAS) throws IOException, org.apache.uima.collection.CollectionException
IOException
org.apache.uima.collection.CollectionException
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.