DKPro BigData enables the easy execution of UIMA-based natural language processing pipelines on a hadoop cluster.
Features
- Large scale NLP processing using UIMA and hadoop
- Store your corpora on a Hadoop filesystem and access them from local or distributed pipelines
- Find patterns in your textual data using adaptable collocation extraction
Details
- Execute DKPro pipelines on a hadoop cluster with minimal adaption
- Read data stored on a HDFS Filesystem using DKPro Collection Readers
- Read/Write serialized CASes from HDFS
Contributors:
- Hans-Peter Zorn
- Johannes Simon
- Martin Riedl
- Richard Eckart de Castilho
- Steffen Remus
License
DKPro BigData is licensed under the Apache Software Licence (ASL) Version 2.0.
This project is a joint effort of UKP Lab and the Language Technology Group, Technical University of Darmstadt.