See: Description
| Class | Description |
|---|---|
| AnnotationByLengthFilter |
Removes annotations that do not conform to minimum or maximum length constraints.
|
| BreakIteratorSegmenter |
BreakIterator segmenter.
|
| CamelCaseTokenSegmenter |
Split up existing tokens again if they are camel-case text.
|
| GermanSeparatedParticleAnnotator |
Annotator to be used for post-processing of German corpora that have been lemmatized and
POS-tagged with the TreeTagger, based on the STTS tagset.
|
| LineBasedSentenceSegmenter | Deprecated
Use
RegexSegmenter |
| ParagraphSplitter |
This class creates paragraph annotations for the given input document.
|
| PatternBasedTokenSegmenter |
Split up existing tokens again at particular split-chars.
|
| RegexSegmenter |
This segmenter splits sentences and tokens based on regular expressions that define the sentence
and token boundaries.
|
| StopWordRemover |
Remove all of the specified types from the CAS if their covered text is in the stop word
dictionary.
|
| TokenMerger |
Merges any Tokens that are covered by a given annotation type.
|
| TokenTrimmer |
Remove prefixes and suffixes from tokens.
|
| WhitespaceSegmenter | Deprecated
Use
RegexSegmenter |
| Enum | Description |
|---|---|
| TokenMerger.LemmaMode |
Copyright © 2007–2019 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.