DKPro BigData enables the easy execution of UIMA-based natural language processing pipelines on a hadoop cluster.

Features

Large scale NLP processing using UIMA and hadoop
Store your corpora on a Hadoop filesystem and access them from local or distributed pipelines
Find patterns in your textual data using adaptable collocation extraction

Details

Execute DKPro pipelines on a hadoop cluster with minimal adaption
Read data stored on a HDFS Filesystem using DKPro Collection Readers
Read/Write serialized CASes from HDFS

Contributors:

Hans-Peter Zorn
Johannes Simon
Martin Riedl
Richard Eckart de Castilho
Steffen Remus

License

DKPro BigData is licensed under the Apache Software Licence (ASL) Version 2.0.

This project is a joint effort of UKP Lab and the Language Technology Group, Technical University of Darmstadt.

Support DKPro BigData by allowing the use of cookies

Please support DKPro BigData project by allowing this site to use cookies to track your activity. Doing so allows us to get an idea of how interesting our project is to the community. The EU General Data Protection Regulation (GDPR) requires us to ask you for your consent about the use of cookies. To learn more about how our site makes use of cookies and uses your activity data, please refer to our privacy policy. You can also always revise the choice you make here by visiting out privacy policy page.

Do you allow tracking your activity on this site using cookies?