DKPro Core - MaltParser dependency parsing pipeline with direct access to results

Embedding

Reads the specified file and prints dependencies, one per line. Multiple files can be specified using a wildcard, e.g. ‘*.txt’ (the single quotes are part of the argument to avoid the shell expanding the wildcard!).

Call with pipeline <foldername> <language>, e.g. pipeline myFolder en.

#!/usr/bin/env jython
# Fix classpath scanning - otherise uimaFIT will not find the UIMA types
from java.lang import Thread
from org.python.core.imp import *
Thread.currentThread().contextClassLoader = getSyspathJavaLoader()

# Dependencies and imports for DKPro modules
from jip.embed import require
require('de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.opennlp-asl:1.6.1')
require('de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.maltparser-asl:1.6.1')
require('de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.io.text-asl:1.6.1')
require('de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.io.conll-asl:1.6.1')
from de.tudarmstadt.ukp.dkpro.core.opennlp import *
from de.tudarmstadt.ukp.dkpro.core.maltparser import *
from de.tudarmstadt.ukp.dkpro.core.io.text import *
from de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency import *

# uimaFIT imports
from org.apache.uima.fit.util.JCasUtil import *
from org.apache.uima.fit.pipeline.SimplePipeline import *
from org.apache.uima.fit.factory.AnalysisEngineFactory import *
from org.apache.uima.fit.factory.CollectionReaderFactory import *

# Access to commandline arguments
import sys

# Assemble and run pipeline
pipeline = iteratePipeline(
  createReaderDescription(TextReader,
    TextReader.PARAM_SOURCE_LOCATION, sys.argv[1], # 1st commandline parameter
    TextReader.PARAM_LANGUAGE, sys.argv[2]),       # 2nd commandline parameter
  createEngineDescription(OpenNlpSegmenter),
  createEngineDescription(OpenNlpPosTagger),
  createEngineDescription(MaltParser))

for jcas in pipeline:
  for dep in select(jcas, Dependency): 
    print "dep: [" + dep.dependencyType +"] \t gov: [" + dep.governor.coveredText + "] \t dep: [" + dep.dependent.coveredText + "]" 

Example output:

dep: [det]   gov: [jumps]    dep: [The]
dep: [amod]      gov: [jumps]    dep: [quick]
dep: [amod]      gov: [jumps]    dep: [brown]
dep: [nn]    gov: [jumps]    dep: [fox]
dep: [prep]      gov: [jumps]    dep: [over]
dep: [det]   gov: [dog]      dep: [the]
dep: [amod]      gov: [dog]      dep: [lazy]
dep: [pobj]      gov: [over]     dep: [dog]
dep: [punct]     gov: [jumps]    dep: [.]