public class StanfordSegmenter extends SegmenterBase
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_ALLOW_EMPTY_SENTENCES
Whether to generate empty sentences.
|
static String |
PARAM_BOUNDARIES_TO_DISCARD
The set of regex for sentence boundary tokens that should be discarded.
|
static String |
PARAM_BOUNDARY_FOLLOWERS
This is a Set of String that are matched with .equals() which are allowed to be tacked onto
the end of a sentence after a sentence boundary token, for example ")".
|
static String |
PARAM_BOUNDARY_TOKEN_REGEX
The set of boundary tokens.
|
static String |
PARAM_IS_ONE_SENTENCE
Whether to treat all input as one sentence.
|
static String |
PARAM_LANGUAGE_FALLBACK |
static String |
PARAM_NEWLINE_IS_SENTENCE_BREAK
Strategy for treating newlines as paragraph breaks.
|
static String |
PARAM_REGION_ELEMENT_REGEX
A regular expression for element names containing a sentence region.
|
static String |
PARAM_TOKEN_REGEXES_TO_DISCARD
The set of regex for sentence boundary tokens that should be discarded.
|
static String |
PARAM_XML_BREAK_ELEMENTS_TO_DISCARD
These are elements like "p" or "sent", which will be wrapped into regex for approximate XML
matching.
|
PARAM_LANGUAGE, PARAM_STRICT_ZONING, PARAM_WRITE_SENTENCE, PARAM_WRITE_TOKEN, PARAM_ZONE_TYPES
Constructor and Description |
---|
StanfordSegmenter() |
Modifier and Type | Method and Description |
---|---|
protected void |
process(org.apache.uima.jcas.JCas aJCas,
String aText,
int aZoneBegin) |
createSentence, createToken, createToken, getLanguage, getLocale, getZoneTypes, isEmpty, isStrictZoning, isWriteSentence, isWriteToken, limit, process, trim, trimChar
getLogger, initialize
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
public static final String PARAM_LANGUAGE_FALLBACK
public static final String PARAM_BOUNDARY_TOKEN_REGEX
WordToSentenceProcessor.WordToSentenceProcessor(java.lang.String, java.util.Set<java.lang.String>, java.util.Set<java.lang.String>, java.util.regex.Pattern, java.util.regex.Pattern)
,
Constant Field Valuespublic static final String PARAM_BOUNDARY_FOLLOWERS
WordToSentenceProcessor.DEFAULT_BOUNDARY_FOLLOWERS
,
Constant Field Valuespublic static final String PARAM_XML_BREAK_ELEMENTS_TO_DISCARD
public static final String PARAM_BOUNDARIES_TO_DISCARD
WordToSentenceProcessor.DEFAULT_SENTENCE_BOUNDARIES_TO_DISCARD
,
Constant Field Valuespublic static final String PARAM_REGION_ELEMENT_REGEX
public static final String PARAM_NEWLINE_IS_SENTENCE_BREAK
public static final String PARAM_TOKEN_REGEXES_TO_DISCARD
public static final String PARAM_IS_ONE_SENTENCE
public static final String PARAM_ALLOW_EMPTY_SENTENCES
protected void process(org.apache.uima.jcas.JCas aJCas, String aText, int aZoneBegin) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process
in class SegmenterBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
Copyright © 2007–2016 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.