Installing Java and Eclipse
These steps install the basis system requirements needed to implement DKPro Core pipelines using the Java language. They need to be performed only once.
- Download and install the Java SE Development Kit 7 from the Oracle Java Site
- Download and install the Eclipse IDE for Java Developers from the Eclipse website
- The Eclipse IDE for Java Developers already contains support for the Java language and the Maven plugin that we require. Of course you can use any other Eclipse distribution that supports Java and manually install the Maven plugin.
Running the pipeline
For a start, let’s try a simple analysis pipeline:
- Read an English text file called “document.txt”
- Perform tokenization and sentence boundary detection using OpenNLP
- Perform lemmatization using !LanguageTool
- Perform dependency parsing using !MaltParser
- Write the result to disk in CoNLL 2006 format
Here is how to run that:
- Open Eclipse
- Create a new Maven project
- Open the file pom.xml, switch to the tab Dependencies
- Add the following dependencies
Group Id | Artifact Id | Version |
de.tudarmstadt.ukp.dkpro.core | de.tudarmstadt.ukp.dkpro.core.opennlp-asl | 1.6.2 |
de.tudarmstadt.ukp.dkpro.core | de.tudarmstadt.ukp.dkpro.core.languagetool-asl | 1.6.2 |
de.tudarmstadt.ukp.dkpro.core | de.tudarmstadt.ukp.dkpro.core.maltparser-asl | 1.6.2 |
de.tudarmstadt.ukp.dkpro.core | de.tudarmstadt.ukp.dkpro.core.io.text-asl | 1.6.2 |
de.tudarmstadt.ukp.dkpro.core | de.tudarmstadt.ukp.dkpro.core.io.conll-asl | 1.6.2 |
- Create a new class file called Pipeline.java in the folder src/main/java and copy/paste the code below
- Create a new text file called document.txt in the project root
- Run the class Pipeline in the package example
- Right-click on the project folder and select Refresh to see the file created by the pipeline
The result is written to a file called document.txt.conll and could look something like this:
Where to go from here?
You can find many more examples of what you can do with DKPro Core and Java on our Java recipes for DKPro Core pipelines page and in the DKPro Core examples project.