DKPro TC is a UIMA-based text classification framework built on top of DKPro Core and DKPro Lab. It is intended to facilitate supervised machine learning experiments with any kind of textual data.

DKPro TC comes with

Getting-started example code for standard text collections, e.g. the Reuters-21578 Text Categorization corpus, in Java and Groovy
many generic feature extractors, e.g. n-grams, POS-tags etc.
convenient parameter optimization capabilities
comprehensive reporting with support for many standard performance measures
support for single- and multi-label classification, and regression in various frameworks, e.g. CRFsuite, DyNet, DeepLearning4j, LibLinear, LibSvm, Keras, SvmHmm, VowpalWabbit, Weka and XGBoost,

If you want to use the latest (snapshot) version of DKPro TC, please mind that the project is subject to constant change.

How to cite?

If you use DKPro TC in research, please cite the following paper:

Johannes Daxenberger, Oliver Ferschke, Iryna Gurevych, and Torsten Zesch (2014). DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (System Demonstrations), pp. 61-66, Baltimore, Maryland, USA. (pdf) (bib)

Tobias Horsmann and Torsten Zesch (2018). DeepTC - An Extension of DKPro Text Classification for Fostering Reproducibility of Deep Learning Experiments. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 2539-2545, Miyazaki, Japan. (pdf) (bib)

License

While most DKPro TC modules are available under the Apache Software License (ASL) version 2, there are a few modules that depend on external libraries and are thus licensed under the GPL. The license of each individual module is specified in its LICENSE file.

It must be pointed out that while the component’s source code itself is licensed under the ASL or GPL, individual components might make use of third-party libraries or products that are not licensed under the ASL or GPL. Please make sure that you are aware of the third party licenses and respect them.

About

This project was initiated under the auspices of Prof. Iryna Gurevych, Ubiquitous Knowledge Processing Lab (UKP), Technische Universität Darmstadt. It is now jointly developed by UKP Lab (Technische Universität Darmstadt), Language Technology Lab (Universität Duisburg-Essen), and other contributors.

Support DKPro TC by allowing the use of cookies

Please support DKPro TC project by allowing this site to use cookies to track your activity. Doing so allows us to get an idea of how interesting our project is to the community. The EU General Data Protection Regulation (GDPR) requires us to ask you for your consent about the use of cookies. To learn more about how our site makes use of cookies and uses your activity data, please refer to our privacy policy. You can also always revise the choice you make here by visiting out privacy policy page.

Do you allow tracking your activity on this site using cookies?