DKPro Core - MaltParser dependency parsing pipeline writing to CONLL format

Analytics

Reads all text files (*.txt) in the specified folder and prints dependencies, one per line.

Call with pipeline <inputfolder> <language> <outputfolder>, e.g. pipeline input en output.

@Grab(group='de.tudarmstadt.ukp.dkpro.core', 
module='de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl',
version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.maltparser-asl',
version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.io.text-asl',
version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.io.conll-asl',
version='1.5.0')

import static org.apache.uima.fit.pipeline.SimplePipeline.*;
import static org.apache.uima.fit.factory.CollectionReaderFactory.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;

import de.tudarmstadt.ukp.dkpro.core.stanfordnlp.*;
import de.tudarmstadt.ukp.dkpro.core.maltparser.*;
import de.tudarmstadt.ukp.dkpro.core.io.conll.*;
import de.tudarmstadt.ukp.dkpro.core.io.text.*;

// Assemble and run pipeline
runPipeline(
createReaderDescription(TextReader,
TextReader.PARAM_PATH, args[0], // first command line parameter
TextReader.PARAM_LANGUAGE, args[1], // second command line parameter
TextReader.PARAM_PATTERNS, "[+]*.txt"),
createEngineDescription(StanfordSegmenter),
createEngineDescription(StanfordPosTagger),
createEngineDescription(MaltParser),
createEngineDescription(Conll2006Writer,
Conll2006Writer.PARAM_TARGET_LOCATION, args[2])); // third command line parameter);

Example output:

1   The _   DT  DT  _   4   det _   _
2 quick _ JJ JJ _ 4 amod _ _
3 brown _ JJ JJ _ 4 amod _ _
4 fox _ NN NN _ 5 nsubj _ _
5 jumps _ VBZ VBZ _ 0 _ _ _
6 over _ IN IN _ 5 prep _ _
7 the _ DT DT _ 9 det _ _
8 lazy _ JJ JJ _ 9 amod _ _
9 dog _ NN NN _ 6 pobj _ _
10 . _ . . _ 5 punct _ _