Anatomy of a DKPro TC experiment
Subsequently, we introduce the key concepts necessary for using DKPro TC by discussing a minimal setup and walk the reader to the building blocks of an experiment.
Dimensions and the parameter space
An experiment consists of (i) several dimensions that are combined in a (ii) parameter space and provided to an experiment. Regarding (i), dimensions are the basic building blocks of an experimental setup. Almost every parameter that is altered in an experiment is changed or set via a dimension. The dimensions in the code declare three building blocks: First, the readers that provide the data for the experiment, second, the feature set that is used in this experiment, and third, the classification arguments that specify the classifier which is to be used (Liblinear in this case). Regarding (ii), the parameter space is main data structure which is used by DKPro TC in the background; it is important that all created dimension are added to the parameter space, otherwise they are not used.
DKPro TC has two experimental modi, a train-test experiment (shown in the code snippet) in which a fixed train-test data split is provided by the user or cross-validation in which DKPro TC splits the data autonomously into the number of requested folds.
Results of an experiment
The results are written to the folder provided as
DKPRO_HOME directory. The subfolder contain all output written by an experiment, and not just the final results. The folder with the results is the
Evaluation-* folder. The other folders are probably not of importance for using DKPRo TC, but we explain their content yet briefly. For a train-test experiment, the following folders are created:
InitTask folders contain the provided training and testing data converted into an internal data format.
OutcomeCollectionTask collects all occurring labels in the training and testing data (or nothing if its regression).
MetaInfoTask prepares the usage of features that use a frequency cut-off, i.e. the word-ngram feature that is used in the experimental setup.
ExtractFeatureTask contain the extracted features in the data format the respective classifier expects.
<MachineLearningAdapter> execute the actual classifier with the feature data extracted before. The results per instance and some more low-level information can be found in the