public class BreakIteratorSegmenter extends SegmenterBase
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_SPLIT_AT_APOSTROPHE
Per default the Java
BreakIterator does not split off contractions like
John's into two tokens. |
PARAM_LANGUAGE, PARAM_STRICT_ZONING, PARAM_WRITE_FORM, PARAM_WRITE_SENTENCE, PARAM_WRITE_TOKEN, PARAM_ZONE_TYPES
Constructor and Description |
---|
BreakIteratorSegmenter() |
Modifier and Type | Method and Description |
---|---|
protected void |
process(org.apache.uima.jcas.JCas aJCas,
String text,
int zoneBegin) |
createSentence, createToken, createToken, createToken, getLanguage, getLocale, getZoneTypes, isEmpty, isStrictZoning, isWriteSentence, isWriteToken, limit, process, trim, trimChar
getLogger, initialize
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
public static final String PARAM_SPLIT_AT_APOSTROPHE
BreakIterator
does not split off contractions like
John's
into two tokens. When this parameter is enabled, a non-default token split is
generated when an apostrophe ('
) is encountered.protected void process(org.apache.uima.jcas.JCas aJCas, String text, int zoneBegin) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process
in class SegmenterBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.