DKPro Core - MaltParser dependency parsing pipeline with direct access to results

Embedding

Reads the specified file and prints dependencies, one per line. Multiple files can be specified using a wildcard, e.g. '*.txt' (the single quotes are part of the argument to avoid the shell expanding the wildcard!).

This recipe was motivated by a question on Stack Overflow on how to parse raw text using the MaltParser.

Call with pipeline <foldername> <language>, e.g. pipeline myFolder en.

@Grab(group='de.tudarmstadt.ukp.dkpro.core', 
      module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl', 
      version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core', 
      module='de.tudarmstadt.ukp.dkpro.core.maltparser-asl', 
      version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core', 
      module='de.tudarmstadt.ukp.dkpro.core.io.text-asl', 
      version='1.5.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core', 
      module='de.tudarmstadt.ukp.dkpro.core.io.conll-asl', 
      version='1.5.0')

import static org.apache.uima.fit.pipeline.SimplePipeline.*;
import static org.apache.uima.fit.util.JCasUtil.*;
import static org.apache.uima.fit.factory.CollectionReaderFactory.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;

import de.tudarmstadt.ukp.dkpro.core.opennlp.*;
import de.tudarmstadt.ukp.dkpro.core.maltparser.*;
import de.tudarmstadt.ukp.dkpro.core.io.text.*;
import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.*;

// Assemble and run pipeline
def pipeline = iteratePipeline(
  createReaderDescription(TextReader,
    TextReader.PARAM_SOURCE_LOCATION, args[0], // first command line parameter
    TextReader.PARAM_LANGUAGE, args[1]), // second command line parameter
  createEngineDescription(OpenNlpSegmenter),
  createEngineDescription(OpenNlpPosTagger),
  createEngineDescription(MaltParser));

for (def jcas : pipeline) {
  select(jcas, Dependency).each { 
    println "dep: [${it.dependencyType}] \t gov: [${it.governor.coveredText}] \t dep: [${it.dependent.coveredText}]" 
  }
}

Example output:

dep: [det]   gov: [jumps]    dep: [The]
dep: [amod]      gov: [jumps]    dep: [quick]
dep: [amod]      gov: [jumps]    dep: [brown]
dep: [nn]    gov: [jumps]    dep: [fox]
dep: [prep]      gov: [jumps]    dep: [over]
dep: [det]   gov: [dog]      dep: [the]
dep: [amod]      gov: [dog]      dep: [lazy]
dep: [pobj]      gov: [over]     dep: [dog]
dep: [punct]     gov: [jumps]    dep: [.]