public class TokenizedTextWriter extends JCasFileWriter_ImplBase
PARAM_FEATURE_PATH
.JCasFileWriter_ImplBase.NamedOutputStream
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_FEATURE_PATH
The feature path, e.g.
|
static String |
PARAM_NUMBER_REGEX
All tokens that match this regex are replaced by
NUM . |
static String |
PARAM_STOPWORDS_FILE
All the tokens listed in this file (one token per line) are replaced by
STOP . |
static String |
PARAM_TARGET_ENCODING
Encoding for the target file.
|
JAR_PREFIX, PARAM_COMPRESSION, PARAM_ESCAPE_DOCUMENT_ID, PARAM_OVERWRITE, PARAM_SINGULAR_TARGET, PARAM_STRIP_EXTENSION, PARAM_TARGET_LOCATION, PARAM_USE_DOCUMENT_ID
Constructor and Description |
---|
TokenizedTextWriter() |
Modifier and Type | Method and Description |
---|---|
void |
collectionProcessComplete() |
void |
initialize(org.apache.uima.UimaContext context) |
void |
process(org.apache.uima.jcas.JCas aJCas) |
getCompressionMethod, getOutputStream, getOutputStream, getRelativePath, getTargetLocation, isStripExtension, isUseDocumentId
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
public static final String PARAM_TARGET_ENCODING
public static final String PARAM_FEATURE_PATH
de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Lemma/value
for lemmas. Default:
de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token
(i.e. token texts).
In order to specify a different annotation use the annotation class' type name (e.g.
Token.class.getTypeName()
) and optionally append a field, e.g. /value
to
specify the feature path. If you do not specify a field, the covered text is used.
public static final String PARAM_NUMBER_REGEX
NUM
. Examples:
Make sure that these regular expressions are fit to the segmentation, e.g. if your work on tokens, your tokenizer might split prefixes such as + and - from the rest of the number.
public static final String PARAM_STOPWORDS_FILE
STOP
. Empty
lines and lines starting with #
are ignored. Casing is ignored.public void initialize(org.apache.uima.UimaContext context) throws org.apache.uima.resource.ResourceInitializationException
initialize
in interface org.apache.uima.analysis_component.AnalysisComponent
initialize
in class org.apache.uima.fit.component.JCasConsumer_ImplBase
org.apache.uima.resource.ResourceInitializationException
public void process(org.apache.uima.jcas.JCas aJCas) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process
in class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
public void collectionProcessComplete() throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
collectionProcessComplete
in interface org.apache.uima.analysis_component.AnalysisComponent
collectionProcessComplete
in class JCasFileWriter_ImplBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
Copyright © 2007–2016 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.