Deep learning experiments with DKPro TC
At the moment, three deep learning experiments are supported by DKPro TC:
- Deeplearning 4j (https://deeplearning4j.org)
- DyNet (https://github.com/clab/dynet)
- Keras (https://keras.io)
DyNet
and Keras
are used via Python, using these two frameworks in DKPro TC requires that:
- Python is locally installed
- the deep learning framework with all dependencies are locally installed
Deeplearning 4j is written in Java and requires no additional installation effort.
Python-based Deep Learning Experiments in DKPro TC
Python-based frameworks are not as straight-forward to integrate as Java-based frameworks. We discuss subsequently how using Python-based frameworks in DKPro TC and how the interfacing between Java/Python works. The code snipped below shows a setup to configure a Python-based DKPro TC deep learning experiment. The biggest difference to a shallow learning
experiment is the wiring of the ParameterSpace
, which uses a few more additional dimensions
.
When the experiment is executed, the vectorization
into integer is automatically performed on the training and testing data, the word embeddings are pruned to contain only occuring vocabulary, and are all passed to the code-snipped provided as file path in the dimension DIM_USER_CODE
.
The receiving Python code has then eventually to take care of loading the provided data files into the data format the framework expects.
Results of an experiment
The results are written to the folder provided as DKPRO_HOME
directory. The subfolder contain all output written by an experiment, and not just the final results. The folder with the results is the Evaluation-*
folder. The other folders are probably not of importance for using DKPRo TC, but we explain their content yet briefly. For a train-test experiment, the following folders are created:
- InitTaskDeep-Train-ExperimentName-*
- InitTaskDeep-Test-ExperimentName-*
- EmbeddingTask-ExperimentName-*
- VectorizationTask-Train-ExperimentName-*
- VectorizationTask-Test-ExperimentName-*
- DKProTcShallowTestTask-ExperimentName-*
- <MachineLearningAdapter>-ExperimentName-*
- Evaluation-ExperimentName-*
The InitTaskDeep
folders contain the provided training and testing data converted into an internal data format. EmbeddingTask
takes care of pruning the provied embedding (if one was provided) or initializes missing words with a random vector. This step does nothing if no embedding is provided. VectorizationTask
transforms the training and testing data into a flat file format, which is provied in <MachineLearningAdapter>
to the deep learning code. The results per instance and some more low-level information can be found in the <MachineLearningAdapter>
folder.