DKPro TC - Basic Experiments

Anatomy of a DKPro TC experiment

Subsequently, we introduce how to configure a DKPro TC experiement by discussing a minimal setup.

// Defining the readers that read the data that we use in an experiment
public CollectionReaderDescription getReaderTrain() { ... }
public CollectionReaderDescription getReaderTest() { ... }

public AnalysisEngineDescription getPreprocessing() { ... }

TcFeatureSet featureSet = new TcFeatureSet(
				TcFeatureFactory.create(WordNGram.class, 
							WordNGram.PARAM_NGRAM_USE_TOP_K, 50));

// The experiment builder lets the user wire an experiment with few steps
 ExperimentBuilder builder = new ExperimentBuilder();
// Setup - providing all parameters and machine learning adapters that are used for an experiment
builder.experiment(ExperimentType.TRAIN_TEST, "trainTest")
       .dataReaderTrain(getReaderTrain())
       .dataReaderTest(getReaderTest())
       .preprocessing(getPreprocessing())
       .featureSets(featureSet)
       .learningMode(LearningMode.SINGLE_LABEL)
       .featureMode(FeatureMode.DOCUMENT)
       .machineLearningBackend(
       // an example how to configure an execution of four classifiers in a setup.
       // DKPro TC will run the feature extraction and use each classifier
       new MLBackend(new XgboostAdapter(), "objective=multi:softmax"),
                     new MLBackend(new WekaAdapter(), SMO.class.getName(), "-C", "1.0", "-K",
                                                      PolyKernel.class.getName() + " " + "-C -1 -E 2"),
                     new MLBackend(new LiblinearAdapter(), "-s", "4", "-c", "100"),
                     new MLBackend(new LibsvmAdapter(), "-s", "1", "-c", "1000", "-t", "3"));
	// Execute the experiment		
        builder.run();

Results of an experiment

The results are written to the folder provided as DKPRO_HOME directory. The subfolder contain all output written by an experiment, and not just the final results. The folder with the results is the Evaluation-* folder. The other folders are probably not of importance for using DKPRo TC, but we explain their content yet briefly. For a train-test experiment, the following folders are created:

  • InitTask-Train-ExperimentName-*
  • InitTask-Test-ExperimentName-*
  • OutcomeCollectionTask-ExperimentName-*
  • MetaInfoTask-ExperimentName-*
  • ExtractFeaturesTask-Train-ExperimentName-*
  • ExtractFeaturesTask-Test-ExperimentName-*
  • DKProTcShallowTestTask-ExperimentName-*
  • <MachineLearningAdapter>-ExperimentName-*
  • Evaluation-ExperimentName-*

The InitTask folders contain the provided training and testing data converted into an internal data format. OutcomeCollectionTask collects all occurring labels in the training and testing data (or nothing if its regression). MetaInfoTask prepares the usage of features that use a frequency cut-off, i.e. the word-ngram feature that is used in the experimental setup. ExtractFeatureTask contain the extracted features in the data format the respective classifier expects. DKProTcShallowTestTask and <MachineLearningAdapter> execute the actual classifier with the feature data extracted before. The results per instance and some more low-level information can be found in the <MachineLearningAdapter> folder.