See: Description
Class | Description |
---|---|
AnnotationByLengthFilter |
Removes annotations that do not conform to minimum or maximum length constraints.
|
BreakIteratorSegmenter |
BreakIterator segmenter.
|
CamelCaseTokenSegmenter |
Split up existing tokens again if they are camel-case text.
|
GermanSeparatedParticleAnnotator |
Annotator to be used for post-processing of German corpora that have been lemmatized and POS-tagged with the
TreeTagger, based on the STTS tagset.
|
LineBasedSentenceSegmenter |
Annotates each line in the source text as a sentence.
|
ParagraphSplitter |
This class creates paragraph annotations for the given input document.
|
PatternBasedTokenSegmenter |
Split up existing tokens again at particular split-chars.
|
RegexTokenizer |
This segmenter splits sentences and tokens based on regular expressions that define the sentence
and token boundaries.
|
TokenMerger |
Merges any Tokens that are covered by a given annotation type.
|
TokenTrimmer |
Remove prefixes and suffixes from tokens.
|
WhitespaceTokenizer | Deprecated
Use
RegexTokenizer |
Enum | Description |
---|---|
TokenMerger.LemmaMode |
Copyright © 2007–2016 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.