public class IcuSegmenter extends SegmenterBase
| Modifier and Type | Field and Description |
|---|---|
static String |
PARAM_SPLIT_AT_APOSTROPHE
Per default, the segmenter does not split off contractions like
John's into two
tokens. |
PARAM_LANGUAGE, PARAM_STRICT_ZONING, PARAM_WRITE_FORM, PARAM_WRITE_SENTENCE, PARAM_WRITE_TOKEN, PARAM_ZONE_TYPES| Constructor and Description |
|---|
IcuSegmenter() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
process(org.apache.uima.jcas.JCas aJCas,
String text,
int zoneBegin) |
createSentence, createToken, createToken, createToken, getLanguage, getLocale, getZoneTypes, isEmpty, isStrictZoning, isWriteSentence, isWriteToken, limit, processgetRequiredCasInterface, processgetCasInstancesRequired, hasNext, nextpublic static final String PARAM_SPLIT_AT_APOSTROPHE
John's into two
tokens. When this parameter is enabled, a non-default token split is generated when an
apostrophe (') is encountered.protected void process(org.apache.uima.jcas.JCas aJCas,
String text,
int zoneBegin)
throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process in class SegmenterBaseorg.apache.uima.analysis_engine.AnalysisEngineProcessExceptionCopyright © 2007–2019 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.