DKPro BigData - Welcome

DKPro BigData enables the easy execution of UIMA-based natural language processing pipelines on a hadoop cluster.


  • Large scale NLP processing using UIMA and hadoop
  • Store your corpora on a Hadoop filesystem and access them from local or distributed pipelines
  • Find patterns in your textual data using adaptable collocation extraction


  • Execute DKPro pipelines on a hadoop cluster with minimal adaption
  • Read data stored on a HDFS Filesystem using DKPro Collection Readers
  • Read/Write serialized CASes from HDFS


  • Hans-Peter Zorn
  • Johannes Simon
  • Martin Riedl
  • Richard Eckart de Castilho
  • Steffen Remus


DKPro BigData is licensed under the Apache Software Licence (ASL) Version 2.0.

This project is a joint effort of UKP Lab and the Language Technology Group, Technical University of Darmstadt.