- All Implemented Interfaces:
- org.apache.uima.analysis_component.AnalysisComponent
public class GermanSeparatedParticleAnnotator
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase
Annotator to be used for post-processing of German corpora that have been lemmatized and POS-tagged with the
TreeTagger, based on the STTS tagset.
This Annotator deals with German particle verbs. Particle verbs consist of a particle and a stem, e.g. anfangen = an+fangen
There are many usages of German particle verbs where the stem and the particle are separated, e.g., Wir fangen gleich an.
The TreeTagger lemmatizes the verb stem as "fangen" and the separated particle as "an",
the proper verblemma "anfangen" is thus not available as an annotation.
The GermanSeparatedParticleAnnotator replaces the lemma of the stem of particle-verbs (e.g., fangen) by the proper verb lemma
(e.g. anfangen) and leaves the lemma of the separated particle unchanged.