Package | Description |
---|---|
de.tudarmstadt.ukp.dkpro.core.api.transform | |
de.tudarmstadt.ukp.dkpro.core.languagetool |
Grammar and style checker based on LanguageTool.
|
de.tudarmstadt.ukp.dkpro.core.stanfordnlp |
Integration of NLP components from the
Stanford CoreNLP suite.
|
de.tudarmstadt.ukp.dkpro.core.textnormalizer | |
de.tudarmstadt.ukp.dkpro.core.textnormalizer.frequency | |
de.tudarmstadt.ukp.dkpro.core.textnormalizer.transformation |
Modifier and Type | Class and Description |
---|---|
class |
JCasTransformerChangeBased_ImplBase
Base-class for normalizers that do insert/delete/replace operations.
|
Modifier and Type | Class and Description |
---|---|
class |
CjfNormalizer
Converts traditional Chinese to simplified Chinese or vice-versa.
|
Modifier and Type | Class and Description |
---|---|
class |
StanfordPtbTransformer
Uses the normalizing tokenizer of the Stanford CoreNLP tools to escape the text PTB-style.
|
Modifier and Type | Class and Description |
---|---|
class |
SpellingNormalizer
Converts annotations of the type SpellingAnomaly into a SofaChangeAnnoatation.
|
Modifier and Type | Class and Description |
---|---|
class |
CapitalizationNormalizer
Takes a text and replaces wrong capitalization
|
class |
ExpressiveLengtheningNormalizer
Takes a text and shortens extra long words
|
class |
ReplacementFrequencyNormalizer_ImplBase
This base class is for all normalizers that need a frequency provider and replace based on a
list.
|
class |
SharpSNormalizer
Takes a text and replaces sharp s
|
class |
UmlautNormalizer
Takes a text and checks for umlauts written as "ae", "oe", or "ue" and normalizes them if they
really are umlauts depending on a frequency model.
|
Modifier and Type | Class and Description |
---|---|
class |
DictionaryBasedTokenTransformer
Reads a tab-separated file containing mappings from one token to another.
|
class |
FileBasedTokenTransformer
Replaces all tokens that are listed in the file in
FileBasedTokenTransformer.PARAM_MODEL_LOCATION by the string
specified in FileBasedTokenTransformer.PARAM_REPLACEMENT . |
class |
HyphenationRemover
Simple dictionary-based hyphenation remover.
|
class |
RegexBasedTokenTransformer
A
JCasTransformerChangeBased_ImplBase implementation that replaces tokens based on a
regular expressions. |
class |
TokenCaseTransformer
Change tokens to follow a specific casing: all upper case, all lower case, or 'normal case':
lowercase everything but the first character of a token and the characters immediately following
a hyphen.
|
Copyright © 2007–2016 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.