This page describes processing a small paragraph with Stanford CoreNLP components (StanfordSegmenter, StanfordNamedEntityRecognizer, StanfordParser) and writing out the noun phrases (NP) and Named Entities (NE) occurring in the NPs to the console output, such as e.g.
All these components are UIMA annotators for the Stanford CoreNLP software.
Pre-requisites
Either, create a new Maven project or incorporate the following to your existing Maven project. It is expected that you read the introductory material. You’ll need the following dependencies.
For example, put all your sentences in a text file and read them with a default reader from DkPro. (The directory containing the file is given as the first command line argument.)
It is important to set the language (two letter ISO 639-1 code) at the reader.
Example sentences
As examples, the following sentences are used:
StanfordSegmenter
The Stanford Segmenter performs tokenizations and makes sentence and token annotations and can serve as a high quality alternative for other segmenter components like the BreakIteratorSegmenter.
Pipeline
Include the component into your pipeline this way:
Output
An example annotations output may look like this:
StanfordNamedEntityRecognizer
The Stanford NER is a CRFClassifier implementation of a Named Entity Recognizer to label sequences of words in a text which are the names of things, such as person and company names.
Models and types
Included with the Stanford NER are a 4 class model trained for CoNLL, a 7 class model trained for MUC, and a 3 class model trained on both data sets for the intersection of those class sets.
3 class Location, Person, Organization
4 class Location, Person, Organization, Misc
7 class Time, Location, Organization, Person, Money, Percent, Date
The corresponding UIMA annotation types are called “Person”, “Organization”, “Location” etc., from the package de.tudarmstadt.ukp.dkpro.core.type.ner.
Pipeline
Include the component into your pipeline this way:
The default variant for English is all.3class.distsim.crf, other variants can be set by PARAM_VARIANT.
Output
An example annotations output may look like this:
StanfordParser
The Stanford Parser is a program that works out the grammatical structure of sentences. There are models available for many languages, e.g. Englisch and German.
Models
There are different models for various languages, parsers include a PCFG (probabilistic context-free grammar) parser and a factored parser.
Pipeline
Include the component into your pipeline this way:
The default variant for English is factored.
Output
An example annotations output may look like this:
Consumer
Create your experiment
The entire pipeline may look like this:
Output
The final output prints the noun phrases (NP) and the named entities (NE) within them. An example may look like this:
Source
You can find the current sources for this recipe here.
Please support DKPro Core project by allowing this site to use cookies to track your activity. Doing so allows us to get an idea of how interesting our project is to the community. The EU General Data Protection Regulation (GDPR) requires us to ask you for your consent about the use of cookies. To learn more about how our site makes use of cookies and uses your activity data, please refer to our privacy policy. You can also always revise the choice you make here by visiting out privacy policy page.
Do you allow tracking your activity on this site using cookies?