Welcome

DKPro is a community of projects focussing on re-usable Natural Language Processing software.

Ready to use software components for natural language processing, based on the Apache UIMA framework.
More ›

Pure-Python implementation of the Common Analysis System (CAS) as defined by the UIMA framework including the ability to load/save UIMA CAS XMI files.
More ›

DKPro TC

UIMA-based text classification framework built on top of DKPro Core, DKPro Lab and the Weka Machine Learning Toolkit. It is intended to alleviate supervised machine learning experiments with any kind of textual data.
More ›

DKPro Statistics

Collection of open-licensed statistical tools, currently including correlation and inter-rater agreement methods.
More ›

DKPro Similarity

Framework for developing text similarity algorithms.
More ›

DKPro WSD

Modular, extensible Java framework for word sense disambiguation.
More ›

DKPro BigData

Facilitate using DKPro UIMA components with Hadoop.
More ›

DKPro C4Corpus

Tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate removal, language detection, and near-duplicate removal
More ›

DKPro Keyphrases

Framework for keyphrase extraction.
More ›

DKPro Lab

Lightweight framework for parameter sweeping experiments. It allows you to set up experiments consisting of multiple interdependent tasks in a declarative manner with minimal overhead.
More ›

DKPro LSR

Unified API for several lexical-semantic resources.
More ›

DKPro Toolbox

An easy to use interface to the DKPro Core libraries, mainly for teaching purposes -- Inspired by NLTK.
More ›

CSniper

Search-based annotation tool to help distributed annotation teams finding infrequent linguistic phenomena in large corpora.
More ›

Uby

Framework for creating and accessing sense-linked lexical resources in accordance with the UBY-LMF lexicon model, an instantiation of the ISO standard Lexicon Markup Framework (LMF).
More ›

JOTL

Java OpenThesaurus Library allows to access all information contained in OpenThesaurus, such as glosses, usage examples, translations and much more.
More ›

jWeb1t

Efficient access to Web1T n-gram data.
More ›

JOWKL

Java OmegaWiki Library allows to access all information contained in OmegaWiki, such as glosses, usage examples, translations and much more.
More ›

JWKTL

Java Wiktionary Library allows to access the information contained in Wiktionary.
More ›

JWPL

Java Wikipedia Library allows to access all information contained in Wikipedia.
More ›

Here are a few additional projects which are not part of DKPro proper, but which are closely related, compatible with DKPro products and building on them.

INCEpTION

A semantic annotation platform offering intelligent assistance and knowledge management.
More ›

WebAnno

General purpose web-based annotation tool for a wide range of linguistic annotations.
More ›