org.dkpro.core.tokit (DKPro Core 2.0.0 API)

Class Summary
Class	Description
AnnotationByLengthFilter	Removes annotations that do not conform to minimum or maximum length constraints.
BreakIteratorSegmenter	BreakIterator segmenter.
CamelCaseTokenSegmenter	Split up existing tokens again if they are camel-case text.
GermanSeparatedParticleAnnotator	Annotator to be used for post-processing of German corpora that have been lemmatized and POS-tagged with the TreeTagger, based on the STTS tagset.
LineBasedSentenceSegmenter	Deprecated Use `RegexSegmenter`
ParagraphSplitter	This class creates paragraph annotations for the given input document.
PatternBasedTokenSegmenter	Split up existing tokens again at particular split-chars.
RegexSegmenter	This segmenter splits sentences and tokens based on regular expressions that define the sentence and token boundaries.
StopWordRemover	Remove all of the specified types from the CAS if their covered text is in the stop word dictionary.
TokenMerger	Merges any Tokens that are covered by a given annotation type.
TokenTrimmer	Remove prefixes and suffixes from tokens.
WhitespaceSegmenter	Deprecated Use `RegexSegmenter`

Enum Summary
Enum	Description
TokenMerger.LemmaMode

Package org.dkpro.core.tokit Description

Collection of tokenization and segmentation components.