DKPro BigData - Welcome

DKPro BigData enables the easy execution of UIMA-based natural language processing pipelines on a hadoop cluster.

Features

  • Large scale NLP processing using UIMA and hadoop
  • Store your corpora on a Hadoop filesystem and access them from local or distributed pipelines
  • Find patterns in your textual data using adaptable collocation extraction

Details

  • Execute DKPro pipelines on a hadoop cluster with minimal adaption
  • Read data stored on a HDFS Filesystem using DKPro Collection Readers
  • Read/Write serialized CASes from HDFS

Contributors:

  • Hans-Peter Zorn
  • Johannes Simon
  • Martin Riedl
  • Richard Eckart de Castilho
  • Steffen Remus

License

DKPro BigData is licensed under the Apache Software Licence (ASL) Version 2.0.

This project is a joint effort of UKP Lab and the Language Technology Group, Technical University of Darmstadt.

Support DKPro BigData by allowing the use of cookies

Please support DKPro BigData project by allowing this site to use cookies to track your activity. Doing so allows us to get an idea of how interesting our project is to the community. The EU General Data Protection Regulation (GDPR) requires us to ask you for your consent about the use of cookies. To learn more about how our site makes use of cookies and uses your activity data, please refer to our privacy policy. You can also always revise the choice you make here by visiting out privacy policy page.

Do you allow tracking your activity on this site using cookies?