public class TrailingCharacterRemover
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_MIN_TOKEN_LENGTH
All tokens that are shorter than the minimum token length after removing trailing chars are
completely removed.
|
static String |
PARAM_PATTERN
A regex to be trimmed from the end of tokens.
|
Constructor and Description |
---|
TrailingCharacterRemover() |
Modifier and Type | Method and Description |
---|---|
void |
initialize(org.apache.uima.UimaContext context) |
void |
process(org.apache.uima.jcas.JCas aJCas) |
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
public static final String PARAM_PATTERN
Default: "[\\Q,-“^»*’()&/\"'©§'—«·=\\E0-9A-Z]+"
(remove punctuations, special
characters and capital letters).
public static final String PARAM_MIN_TOKEN_LENGTH
Shorter tokens that do not have trailing chars removed are always retained, regardless of their length.
public void initialize(org.apache.uima.UimaContext context) throws org.apache.uima.resource.ResourceInitializationException
initialize
in interface org.apache.uima.analysis_component.AnalysisComponent
initialize
in class org.apache.uima.fit.component.JCasAnnotator_ImplBase
org.apache.uima.resource.ResourceInitializationException
public void process(org.apache.uima.jcas.JCas aJCas) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process
in class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.