de.tudarmstadt.ukp.dkpro.core.tokit (DKPro Core 1.8.0 API)

Class Summary
Class	Description
AnnotationByLengthFilter	Removes annotations that do not conform to minimum or maximum length constraints.
BreakIteratorSegmenter	BreakIterator segmenter.
CamelCaseTokenSegmenter	Split up existing tokens again if they are camel-case text.
GermanSeparatedParticleAnnotator	Annotator to be used for post-processing of German corpora that have been lemmatized and POS-tagged with the TreeTagger, based on the STTS tagset.
LineBasedSentenceSegmenter	Annotates each line in the source text as a sentence.
ParagraphSplitter	This class creates paragraph annotations for the given input document.
PatternBasedTokenSegmenter	Split up existing tokens again at particular split-chars.
RegexTokenizer	This segmenter splits sentences and tokens based on regular expressions that define the sentence and token boundaries.
TokenMerger	Merges any Tokens that are covered by a given annotation type.
TokenTrimmer	Remove prefixes and suffixes from tokens.
WhitespaceTokenizer	Deprecated Use `RegexTokenizer`

Enum Summary
Enum	Description
TokenMerger.LemmaMode

Package de.tudarmstadt.ukp.dkpro.core.tokit Description

Collection of tokenization and segmentation components.