DKPro Core - Decompounding without a ranker resource

Analytics

Uses the LeftToRightSplitter as the splitter resource and no ranker resource, decompounds the compounds in a sentence after tokenizing it, then print the tokens and each compound part.

@Grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.7.0',
     module='de.tudarmstadt.ukp.dkpro.core.decompounding-asl')
@Grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.7.0',
     module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl')

import de.tudarmstadt.ukp.dkpro.core.decompounding.uima.annotator.*;
import de.tudarmstadt.ukp.dkpro.core.decompounding.uima.resource.*;

import de.tudarmstadt.ukp.dkpro.core.opennlp.*;
import de.tudarmstadt.ukp.dkpro.core.decompounding.uima.resource.*;

import static org.apache.uima.fit.pipeline.SimplePipeline.*;
import static org.apache.uima.fit.factory.JCasFactory.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;
import static org.apache.uima.fit.factory.ExternalResourceFactory.*;
import static org.apache.uima.fit.util.JCasUtil.*;

import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*;

def doc = createJCas();
doc.documentText = "Wir brauchen einen Aktionsplan."
doc.documentLanguage = "de";

runPipeline(doc,
  createEngineDescription(OpenNlpSegmenter),
  createEngineDescription(
    CompoundAnnotator,
    CompoundAnnotator.PARAM_SPLITTING_ALGO,
      createExternalResourceDescription(
        LeftToRightSplitterResource,
        (Object) LeftToRightSplitterResource.PARAM_DICT_RESOURCE,
          createExternalResourceDescription(SharedDictionary),
        LeftToRightSplitterResource.PARAM_MORPHEME_RESOURCE,
          createExternalResourceDescription(SharedLinkingMorphemes))));

println select(doc, Token).collect { it.coveredText }

println select(doc, CompoundPart).collect { it.coveredText }

Example output:

Jul 02, 2014 4:52:49 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase info
Information: :: loading settings :: url = jar:file:/usr/share/groovy/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
Jul 02, 2014 4:52:49 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase configure
Information: Producing resource from [jar:file:/home/santos/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-sentence-de-maxent/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-sentence-de-maxent-20120616.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/sentence-de-maxent.bin] redirected from [jar:file:/home/santos/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-model-sentence-de-maxent/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-model-sentence-de-maxent-20120616.1.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/sentence-de-maxent.properties]
Jul 02, 2014 4:52:49 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
Information: Producing resource took 55ms
Jul 02, 2014 4:52:49 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase info
Information: :: loading settings :: url = jar:file:/usr/share/groovy/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
Jul 02, 2014 4:52:50 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase configure
Information: Producing resource from [jar:file:/home/santos/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-token-de-maxent/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-upstream-token-de-maxent-20120616.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/token-de-maxent.bin] redirected from [jar:file:/home/santos/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.opennlp-model-token-de-maxent/jars/de.tudarmstadt.ukp.dkpro.core.opennlp-model-token-de-maxent-20120616.1.jar!/de/tudarmstadt/ukp/dkpro/core/opennlp/lib/token-de-maxent.properties]
Jul 02, 2014 4:52:50 PM de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase loadResource
Information: Producing resource took 257ms
[Wir, brauchen, einen, Aktionsplan, .]
[Aktion, plan]