DKProTC works with two data types that are expected to be present when an experiment is executed. TextClassificationTarget and the corresponding actual value of this target, the TextClassificationOutcome. For sequence classification examples an additional TextClassificationSequence is necessary that marks explicitly the span of the sequence. The data readers provided by DKPro TC in the package dkpro-tc-io set these values in the reader. When one of the many DKPro Core data format readers is used, these information are not set, yet. The required data types exist only in DKPro TC and are not used by DKPro Core.
Consequently, an additional step is needed that adds the required annotation. This is most easily done by adding a Preprocessing step to a DKPro TC experiment. Below is an example how it could be used for Part-of-Speech (PoS) tagging. In PoS tagging, each token (TextClassificationTarget) of a sentence (TextClassificationSequence) is assigned a single label (TextClassificationOutcome). In order to be used in the Preprocessing step, the class have to inherit from JCasAnnotator_ImplBase:
To add the Preprocessing to your experiment, only a minor modification to your code is necessary:
The Preprocessing is not limited to a single component, assuming we would read plain text with the reader, we would need additionally tokenization (to split the text into sentences and words) and a PoS tagger that provides the expected outcomen in order to train a sequence classifier. In practice, you probably do not want to train a model on tags that are automatically annotated but for the sake of this example, lets assume you do. In this case, the preprocessing could look like shown belown. Note the order of the preprocessing steps, PoS tagging requires tokens why the the segmentation step is comes first, the SequenceOutcomeAnnotator requires tokens and the PoS and is consequently the last component.
Please support DKPro TC project by allowing this site to use cookies to track your activity. Doing so allows us to get an idea of how interesting our project is to the community. The EU General Data Protection Regulation (GDPR) requires us to ask you for your consent about the use of cookies. To learn more about how our site makes use of cookies and uses your activity data, please refer to our privacy policy. You can also always revise the choice you make here by visiting out privacy policy page.
Do you allow tracking your activity on this site using cookies?