Intro using Java

Installing Java and Eclipse

These steps install the basis system requirements needed to implement DKPro Core pipelines using the Java language. They need to be performed only once.

Download and install the Java SE Development Kit 7 from the Oracle Java Site
Download and install the Eclipse IDE for Java Developers from the Eclipse website
- The Eclipse IDE for Java Developers already contains support for the Java language and the Maven plugin that we require. Of course you can use any other Eclipse distribution that supports Java and manually install the Maven plugin.

Running the pipeline

For a start, let’s try a simple analysis pipeline:

Read an English text file called “document.txt”
Perform tokenization and sentence boundary detection using OpenNLP
Perform lemmatization using !LanguageTool
Perform dependency parsing using !MaltParser
Write the result to disk in CoNLL 2006 format

Here is how to run that:

Open Eclipse
Create a new Maven project
Open the file pom.xml, switch to the tab Dependencies
Add the following dependencies

Group Id	Artifact Id	Version
de.tudarmstadt.ukp.dkpro.core	de.tudarmstadt.ukp.dkpro.core.opennlp-asl	1.6.2
de.tudarmstadt.ukp.dkpro.core	de.tudarmstadt.ukp.dkpro.core.languagetool-asl	1.6.2
de.tudarmstadt.ukp.dkpro.core	de.tudarmstadt.ukp.dkpro.core.maltparser-asl	1.6.2
de.tudarmstadt.ukp.dkpro.core	de.tudarmstadt.ukp.dkpro.core.io.text-asl	1.6.2
de.tudarmstadt.ukp.dkpro.core	de.tudarmstadt.ukp.dkpro.core.io.conll-asl	1.6.2

Create a new class file called Pipeline.java in the folder src/main/java and copy/paste the code below
Create a new text file called document.txt in the project root
Run the class Pipeline in the package example
Right-click on the project folder and select Refresh to see the file created by the pipeline

package example;

import static org.apache.uima.fit.factory.AnalysisEngineFactory.createEngineDescription;
import static org.apache.uima.fit.factory.CollectionReaderFactory.createReaderDescription;
import static org.apache.uima.fit.pipeline.SimplePipeline.runPipeline;
import de.tudarmstadt.ukp.dkpro.core.io.conll.Conll2006Writer;
import de.tudarmstadt.ukp.dkpro.core.io.text.TextReader;
import de.tudarmstadt.ukp.dkpro.core.languagetool.LanguageToolLemmatizer;
import de.tudarmstadt.ukp.dkpro.core.maltparser.MaltParser;
import de.tudarmstadt.ukp.dkpro.core.opennlp.OpenNlpPosTagger;
import de.tudarmstadt.ukp.dkpro.core.opennlp.OpenNlpSegmenter;

public class Pipeline {

  public static void main(String[] args) throws Exception {
    runPipeline(
        createReaderDescription(TextReader.class,
            TextReader.PARAM_SOURCE_LOCATION, "document.txt",
            TextReader.PARAM_LANGUAGE, "en"),
        createEngineDescription(OpenNlpSegmenter.class),
        createEngineDescription(OpenNlpPosTagger.class),
        createEngineDescription(LanguageToolLemmatizer.class),
        createEngineDescription(MaltParser.class),
        createEngineDescription(Conll2006Writer.class,
            Conll2006Writer.PARAM_TARGET_LOCATION, "."));
  }
}

The result is written to a file called document.txt.conll and could look something like this:

1	Pierre	Pierre	NNP	NNP	_	2	nn	_	_
2	Vinken	Vinken	NNP	NNP	_	9	nsubj	_	_
3	,	,	,	,	_	2	punct	_	_
4	61	61	CD	CD	_	5	num	_	_
5	years	year	NNS	NNS	_	6	measure	_	_
6	old	old	JJ	JJ	_	2	amod	_	_
7	,	,	,	,	_	2	punct	_	_
8	will	will	MD	MD	_	9	aux	_	_
9	join	join	VB	VB	_	0	_	_	_
10	the	the	DT	DT	_	11	det	_	_
11	board	board	NN	NN	_	9	dobj	_	_
12	as	as	IN	IN	_	9	prep	_	_
13	a	a	DT	DT	_	15	det	_	_
14	nonexecutive	nonexecutive	JJ	JJ	_	15	amod	_	_
15	director	director	NN	NN	_	12	pobj	_	_
16	Nov.	Nov.	NNP	NNP	_	15	dep	_	_
17	29	29	CD	CD	_	16	num	_	_
18	.	.	.	.	_	9	punct	_	_

Where to go from here?

You can find many more examples of what you can do with DKPro Core and Java on our Java recipes for DKPro Core pipelines page and in the DKPro Core examples project.

Installing Java and Eclipse

Running the pipeline

Where to go from here?

Support DKPro Core by allowing the use of cookies