See: Description
Class | Description |
---|---|
AnnotationByLengthFilter |
Removes annotations that do not conform to minimum or maximum length constraints.
|
BreakIteratorSegmenter |
BreakIterator segmenter.
|
CamelCaseTokenSegmenter |
Split up existing tokens again if they are camel-case text.
|
GermanSeparatedParticleAnnotator |
Annotator to be used for post-processing of German corpora that have been lemmatized and
POS-tagged with the TreeTagger, based on the STTS tagset.
|
LineBasedSentenceSegmenter | Deprecated
Use
RegexSegmenter |
ParagraphSplitter |
This class creates paragraph annotations for the given input document.
|
PatternBasedTokenSegmenter |
Split up existing tokens again at particular split-chars.
|
RegexSegmenter |
This segmenter splits sentences and tokens based on regular expressions that define the sentence
and token boundaries.
|
TokenMerger |
Merges any Tokens that are covered by a given annotation type.
|
TokenTrimmer |
Remove prefixes and suffixes from tokens.
|
WhitespaceSegmenter | Deprecated
Use
RegexSegmenter |
Enum | Description |
---|---|
TokenMerger.LemmaMode |
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.