The document provides detailed information about the DKPro Core type system.
The DKPro Core type system forms the interface between all the integrated components. Components store and retrieve their data from the UIMA CAS based on this type system. The type system design is using a rather flat hierarchy and a mostly loose coupling between annotations. It is offered as a set of modules, not as a single monolithic type system.
Types
Type | Description |
---|---|
No description |
|
Contains basic information about the article. |
|
No description |
|
This type represents a decompounding word, i.e.: flowerpot. |
|
No description |
|
A link in the coreference chain. |
|
Database configuration for the connection to the database where the CAS data was retrieved. |
|
A dependency relation between two tokens. |
|
Discourse argument (arg1, arg2) |
|
Attribution annotation (see PTDB for details); not connected to any particular relation as it may belong to two relations thus is covered by DiscourseRelation |
|
Discourse connective |
|
Discourse relation |
|
Document structure element. |
|
No description |
|
No description |
|
No description |
|
A general purpose annotation to store document-wide information in the form of arbitrary key-value string pairs. |
|
No description |
|
Morphological categories that can be attached to tokens. |
|
No description |
|
Named entities refer e.g. to persons, locations, organizations and so on. |
|
The part of speech of a word or a phrase. |
|
The Penn Treebank-style phrase structure string. |
|
Represents the phonetic transcription of some textual element (usually a Token). |
|
RST Tree node |
|
No description |
|
The SemArg annotation is attached to semantic arguments of semantic predicates. |
|
One of the predicates of a sentence (often a main verb, but nouns and adjectives can also be predicates). |
|
The SemanticArgument annotation is attached to semantic arguments of semantic predicates. |
|
The SemanticField is a coarse-grained semantic category that can be attached to nouns, verbs or adjectives. |
|
One of the predicates of a sentence (often a main verb, but nouns and adjectives can also be predicates). |
|
No description |
|
Encodes an edit operation that can be interpreted by the ApplyChangesAnnotator. |
|
This type represents a part of a decompounding word. |
|
Stanford CoreNLP Sentiment annotation |
|
No description |
|
No description |
|
No description |
|
This annotation can be used to indicate an alternate surface form. |
|
Information about a tagset (controlled vocabulary). |
|
Annotates the tf.idf score of a token, stem, or lemma. |
|
Used for storing timing information (e.g. for performance testing). |
|
Token is one of the two types commonly produced by a segmenter (the other being Sentence). |
|
A alternative token text which should be used instead of the covered text if set on a token. |
|
An array representing the topic proportions in a document. |
|
Wikipedia link |
|
Represents a revision in Wikipedia. |
|
An array representing the word embedding vector. |
|
No description |
|
XML document |
|
Supertype for XmlElements and XmlTextNodes. |
Anomalies
Anomaly
description
(String)-
No description
suggestions
(FSArray of SuggestedAction)-
An array of the suggested actions to be taken for this anomaly.
category
(String)-
No description
Producers |
None declared |
---|---|
Consumers |
None declared |
Type | Description |
---|---|
No description |
|
No description |
SuggestedAction
replacement
(String)-
The text covered by the Anomaly annotation should be replaced with the contents of this feature.
certainty
(Float)-
A score representing how certain is this suggested action. Usually in [0,1].
Producers |
|
---|---|
Consumers |
None declared |
GrammarAnomaly
Producers |
|
---|---|
Consumers |
None declared |
SpellingAnomaly
Producers |
|
---|---|
Consumers |
Coreference
This type system contains two types: CoreferenceChain and CoreferenceLink. The CoreferenceChain marks the beginning of a chain. It points to the first CoreferenceLink in the chain. Each CoreferenceLink then points to the next link.
CoreferenceChain
Marks the beginning of a chain.
first
(CoreferenceLink)-
This is the first corefernce link in coreference chain
Producers |
CoreNlpCoreferenceResolver StanfordCoreferenceResolver Conll2012 (format) Tcf (format) |
---|---|
Consumers |
CoreferenceLink
A link in the coreference chain.
next
(CoreferenceLink)-
If there is one, it is the next coreference link to the current coreference link
referenceType
(String)-
The role or type which the covered text has in the coreference chain.
referenceRelation
(String)-
The type of relation between this link and the next link in the chain.
Producers |
CoreNlpCoreferenceResolver StanfordCoreferenceResolver Tcf (format) |
---|---|
Consumers |
Tcf (format) |
Discourse
DiscourseArgument
Discourse argument (arg1, arg2)
parentRelationId
(Integer)-
ID of the parent relation
argumentNumber
(Integer)-
1 or 2
argumentType
(String)-
argument type, e.g. Cause, etc.
Producers |
None declared |
---|---|
Consumers |
None declared |
DiscourseAttribution
Attribution annotation (see PTDB for details); not connected to any particular relation as it may belong to two relations thus is covered by DiscourseRelation
attributeId
(Integer)-
No description
Producers |
None declared |
---|---|
Consumers |
None declared |
DiscourseConnective
Discourse connective
connectiveType
(String)-
connective type
parentRelationId
(Integer)-
ID of the parent relation
Producers |
None declared |
---|---|
Consumers |
None declared |
DiscourseRelation
relationType
(String)-
Relation type (elaboration, contrast, etc.)
arg1
(RSTTreeNode)-
No description
arg2
(RSTTreeNode)-
No description
Producers |
None declared |
---|---|
Consumers |
None declared |
DiscourseRelation
Discourse relation
relationId
(Integer)-
id of the relation
arg1
(DiscourseArgument)-
arg 1
arg2
(DiscourseArgument)-
arg 2
Producers |
None declared |
---|---|
Consumers |
None declared |
Type | Description |
---|---|
Discourse relation |
EDU
originalText
(String)-
No description
Producers |
None declared |
---|---|
Consumers |
None declared |
ExplicitDiscourseRelation
Discourse relation
discourseConnective1
(DiscourseConnective)-
Discourse connective (in case of explicit relations)
discourseConnective2
(DiscourseConnective)-
Discourse connective (in case of explicit relations)
Producers |
None declared |
---|---|
Consumers |
None declared |
RSTTreeNode
RST Tree node
unitType
(String)-
N or S (nucleus/satellite)
Producers |
None declared |
---|---|
Consumers |
None declared |
Type | Description |
---|---|
No description |
|
No description |
ImplicitDiscourseRelation
Implicit discourse relation
Producers |
None declared |
---|---|
Consumers |
None declared |
Metadata
Recording tagset and tag descriptions in the CAS is still a feature under development. It is not supported by all components and it is not yet well defined. Expect changes and enhancements to this feature in future versions of DKPro Core. |
DocumentMetaData
The DocumentMetaData annotation stores information about a single processed document. There can only be one of these annotations per CAS. The annotation is created by readers and contains information to uniquely identify the document from which a CAS was created. Writer components use this information when determining under which filename a CAS is stored.
There are two principle ways of identifying a document:
- collection id / document id: this simple system identifies a document within a collection. The ID of the collection and the document are each simple strings without any further semantics such as e.g. a hierarchy. For this reason, this identification scheme is not well suited to preserve information about directory structures.
- document base URI / document URI: this system identifies a document using
a URI. The base URI is used to derive the relative path of the document with
respect to the base location from where it has been read. E.g. if the base
URI is
file:/texts
and the document URI isfile:/texts/english/text1.txt
, then the relativ path of the document isenglish/text1.txt
. This information is used by writers to recreate the directory structure found under the base location in the target location.
It is possible and indeed common for a writer to initialize both systems of identification. If both systems are present, most writers default to using the URI-based systems. However, most writers also allow forcing the use of the ID-based systems.
In addition to the features given here, there is a language feature inherited from UIMA's DocumentAnnotation. DKPro Core components expect a two letter ISO 639-1 language code there.
documentTitle
(String)-
The human readable title of the document.
documentId
(String)-
The id of the document.
documentUri
(String)-
The URI of the document.
collectionId
(String)-
The ID of the whole document collection.
documentBaseUri
(String)-
Base URI of the document.
isLastSegment
(Boolean)-
CAS de-multipliers need to know whether a CAS is the last multiplied segment. Thus CAS multipliers should set this field to true for the last CAS they produce.
Producers |
ApplyChangesAnnotator AclAnthology (format) Ancora (format) AnnotatedGigaword (format) BlikiWikipedia (format) Bnc (format) Concrete (format) Conll2000 (format) Conll2002 (format) Conll2003 (format) Conll2006 (format) Conll2008 (format) Conll2009 (format) Conll2012 (format) ConllCoreNlp (format) ConllU (format) Html (format) HtmlDocument (format) ImsCwb (format) Jdbc (format) Lcc (format) Lif (format) Lxf (format) NegraExport (format) Nif (format) Nitf (format) Pdf (format) PennTreebankChunked (format) PennTreebankCombined (format) Perseus (format) PubAnnotation (format) RTF (format) Reuters21578Sgml (format) Reuters21578Txt (format) String (format) Tcf (format) Tei (format) Text (format) TigerXml (format) Tika (format) TuebaDZ (format) Tuepp (format) WikipediaArticleInfo (format) WikipediaRevision (format) WikipediaRevisionPair (format) WikipediaTemplateFilteredArticle (format) Xmi (format) Xml (format) XmlDocument (format) XmlText (format) XmlXPath (format) |
---|---|
Consumers |
ApplyChangesAnnotator BinaryCas (format) Concrete (format) Conll2000 (format) Conll2002 (format) Conll2003 (format) Conll2006 (format) Conll2008 (format) Conll2009 (format) Conll2012 (format) ConllCoreNlp (format) ConllU (format) DiTop (format) ImsCwb (format) InlineXml (format) Json (format) Lif (format) Lxf (format) Nif (format) PennTreebankCombined (format) PubAnnotation (format) SerializedCas (format) Tcf (format) Tei (format) Text (format) TigerXml (format) TokenizedText (format) WebannoTsv3X (format) Xmi (format) XmlDocument (format) |
MetaDataStringField
A general purpose annotation to store document-wide information in the form of arbitrary key-value string pairs.
key
(String)-
Name of a metadata field.
value
(String)-
The field value.
Producers |
|
---|---|
Consumers |
None declared |
TagDescription
Description of an individual tag.
name
(String)-
The name of the tag.
Producers |
None declared |
---|---|
Consumers |
None declared |
TagsetDescription
Information about a tagset (controlled vocabulary).
layer
(String)-
The layer to which the tagset applies. This is typically the name of an UIMA type such as "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS".
name
(String)-
The name of the tagset.
tags
(FSArray of TagDescription)-
Descriptions of the tags belonging to this tagset.
componentName
(String)-
No description
modelLocation
(String)-
No description
modelVariant
(String)-
No description
modelLanguage
(String)-
No description
modelVersion
(String)-
No description
input
(Boolean)-
True if the tagset is used as input by the component/model, otherwise false.
Producers |
None declared |
---|---|
Consumers |
None declared |
Morphology
Morpheme
morphTag
(String)-
No description
Producers |
|
---|---|
Consumers |
None declared |
MorphologicalFeatures
Morphological categories that can be attached to tokens. The features are supposed to match the Universal Dependency v1 features.
gender
(String)-
No description
number
(String)-
Singular/plural
case
(String)-
Nouns: nominative, genetiv, dative, …
degree
(String)-
Adjectives: comparative/Superlative
verbForm
(String)-
No description
tense
(String)-
Verbs: past tense, present tense, future tense, etc.
mood
(String)-
Verbs: indicative, imperative, subjunctive
voice
(String)-
Verbs: active/passive
definiteness
(String)-
Definite or indefinite
value
(String)-
The original morphological analysis results as produced by a tool or as recorded in a corpus (if available). If the categories were originally encoded in such a string, the other features are filled by analyzing this string. If the categories were provided separately, e.g. by different attributed in an XML-encoded corpus, this field may remain empty.
person
(String)-
Verbs: 1st, 2nd, 3rd person
aspect
(String)-
Verbs: perfective, imperfective
animacy
(String)-
No description
negative
(String)-
No description
numType
(String)-
No description
possessive
(String)-
No description
pronType
(String)-
No description
reflex
(String)-
No description
transitivity
(String)-
Verbs: transitive/intransitive
@deprecated
Producers |
MateMorphTagger RfTagger SfstAnnotator UDPipePosTagger Conll2006 (format) Conll2008 (format) Conll2009 (format) ConllU (format) |
---|---|
Consumers |
UDPipeParser Conll2006 (format) Conll2008 (format) Conll2009 (format) ConllU (format) |
POS
The part of speech of a word or a phrase.
PosValue
(String)-
Fine-grained POS tag. This is the tag as produced by a POS tagger or obtained from a reader.
coarseValue
(String)-
Coarse-grained POS tag. This may be produced by a POS tagger or reader in addition to the fine-grained tag.
Type | Description |
---|---|
Adjective @deprecated Use POS_ADJ instead |
|
Adposition @deprecated Use POS_ADP instead |
|
Adverb @deprecated Use POS_ADV instead |
|
Determiners and articles. |
|
Auxiliary verb @deprecated Use POS_AUX instead |
|
Numerals @deprecated Use POS_NUM instead |
|
Conjunction @deprecated Use POS_CONJ instead |
|
Determiner @deprecated Use POS_DET instead |
|
Interjection @deprecated Use POS_INTJ instead |
|
Nouns @deprecated Use POS_NOUN instead |
|
Noun @deprecated Use POS_NOUN instead |
|
Numeral @deprecated Use POS_NUM instead |
|
Catch-all for other categories such as abbreviations or foreign words @deprecated Use POS_X instead |
|
Particle @deprecated Use POS_PART instead |
|
Adjective |
|
Adposition |
|
Adverb |
|
Auxiliary verb |
|
Conjunction |
|
Determiner |
|
Interjection |
|
Noun |
|
Numeral |
|
Particle |
|
Pronoun |
|
Proper noun |
|
Punctuation |
|
Subordinating conjunction |
|
Symbol |
|
Verb |
|
Other |
|
Prepositions and postpositions @deprecated Use POS_ADP instead |
|
Pronoun @deprecated Use POS_PRON instead |
|
Pronoun @deprecated Use POS_PRON instead |
|
Proper noun @deprecated Use POS_PROPN instead |
|
Particles @deprecated Use POS_PART instead |
|
Punctuation marks @deprecated Use POS_PUNCT instead |
|
Punctuation @deprecated Use POS_PUNCT instead |
|
Subordinating conjunction @deprecated Use POS_SCONJ instead |
|
Symbol @deprecated Use POS_SYM instead |
|
Verbs @deprecated Use POS_VERB instead |
|
Verb @deprecated Use POS_VERB instead |
|
Other @deprecated Use POS_X instead |
ADJ
Adjective @deprecated Use POS_ADJ instead
Producers |
None declared |
---|---|
Consumers |
None declared |
ADP
Adposition @deprecated Use POS_ADP instead
Producers |
None declared |
---|---|
Consumers |
None declared |
ADV
Adverb @deprecated Use POS_ADV instead
Producers |
None declared |
---|---|
Consumers |
None declared |
ART
Determiners and articles. @deprecated Use POS_DET instead
Producers |
None declared |
---|---|
Consumers |
None declared |
AT
at-mention (indicates another user as a recipient of a tweet) @deprecated Use POS_AT instead
Producers |
None declared |
---|---|
Consumers |
None declared |
AUX
Auxiliary verb @deprecated Use POS_AUX instead
Producers |
None declared |
---|---|
Consumers |
None declared |
CARD
Numerals @deprecated Use POS_NUM instead
Producers |
None declared |
---|---|
Consumers |
None declared |
CONJ
Conjunction @deprecated Use POS_CONJ instead
Producers |
None declared |
---|---|
Consumers |
None declared |
DET
Determiner @deprecated Use POS_DET instead
Producers |
None declared |
---|---|
Consumers |
None declared |
DM
discourse marker, indications of continuation of a message across multiple tweets @deprecated Use POS_DM instead
Producers |
None declared |
---|---|
Consumers |
None declared |
EMO
emoticon @deprecated Use POS_EMO instead
Producers |
None declared |
---|---|
Consumers |
None declared |
HASH
Hashtag (indicates topic/category for tweet) @deprecated Use POS_HASH instead
Producers |
None declared |
---|---|
Consumers |
None declared |
INT
proper noun + verbal @deprecated Use POS_INT instead
Producers |
None declared |
---|---|
Consumers |
None declared |
INTJ
Interjection @deprecated Use POS_INTJ instead
Producers |
None declared |
---|---|
Consumers |
None declared |
N
Nouns @deprecated Use POS_NOUN instead
Producers |
None declared |
---|---|
Consumers |
None declared |
Type | Description |
---|---|
Common noun @deprecated Use POS_NOUN instead |
|
nominal + verbal @deprecated Use POS_NNV instead |
|
Proper noun @deprecated Use POS_PROPN instead |
|
proper noun + verbal @deprecated Use POS_NPV instead |
NN
Common noun @deprecated Use POS_NOUN instead
Producers |
None declared |
---|---|
Consumers |
None declared |
NNV
nominal + verbal @deprecated Use POS_NNV instead
Producers |
None declared |
---|---|
Consumers |
None declared |
NOUN
Noun @deprecated Use POS_NOUN instead
Producers |
None declared |
---|---|
Consumers |
None declared |
NP
Proper noun @deprecated Use POS_PROPN instead
Producers |
None declared |
---|---|
Consumers |
None declared |
NPV
proper noun + verbal @deprecated Use POS_NPV instead
Producers |
None declared |
---|---|
Consumers |
None declared |
NUM
Numeral @deprecated Use POS_NUM instead
Producers |
None declared |
---|---|
Consumers |
None declared |
O
Catch-all for other categories such as abbreviations or foreign words @deprecated Use POS_X instead
Producers |
None declared |
---|---|
Consumers |
None declared |
Type | Description |
---|---|
at-mention (indicates another user as a recipient of a tweet) @deprecated Use POS_AT instead |
|
discourse marker, indications of continuation of a message across multiple tweets @deprecated Use POS_DM instead |
|
emoticon @deprecated Use POS_EMO instead |
|
Hashtag (indicates topic/category for tweet) @deprecated Use POS_HASH instead |
|
proper noun + verbal @deprecated Use POS_INT instead |
|
URL or email address @deprecated Use POS_URL instead |
PART
Particle @deprecated Use POS_PART instead
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_ADJ
Adjective
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_ADP
Adposition
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_ADV
Adverb
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_AT
at-mention (indicates another user as a recipient of a tweet)
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_AUX
Auxiliary verb
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_CONJ
Conjunction
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_DET
Determiner
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_DM
discourse marker, indications of continuation of a message across multiple tweets
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_EMO
emoticon
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_HASH
Hashtag (indicates topic/category for tweet)
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_INT
proper noun + verbal
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_INTJ
Interjection
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_NNV
nominal + verbal
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_NOUN
Noun
Producers |
None declared |
---|---|
Consumers |
None declared |
Type | Description |
---|---|
nominal + verbal |
|
proper noun + verbal |
POS_NPV
proper noun + verbal
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_NUM
Numeral
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_PART
Particle
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_PRON
Pronoun
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_PROPN
Proper noun
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_PUNCT
Punctuation
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_SCONJ
Subordinating conjunction
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_SYM
Symbol
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_URL
URL or email address
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_VERB
Verb
Producers |
None declared |
---|---|
Consumers |
None declared |
POS_X
Other
Producers |
None declared |
---|---|
Consumers |
None declared |
Type | Description |
---|---|
at-mention (indicates another user as a recipient of a tweet) |
|
discourse marker, indications of continuation of a message across multiple tweets |
|
emoticon |
|
Hashtag (indicates topic/category for tweet) |
|
proper noun + verbal |
|
URL or email address |
PP
Prepositions and postpositions @deprecated Use POS_ADP instead
Producers |
None declared |
---|---|
Consumers |
None declared |
PR
Pronoun @deprecated Use POS_PRON instead
Producers |
None declared |
---|---|
Consumers |
None declared |
PRON
Pronoun @deprecated Use POS_PRON instead
Producers |
None declared |
---|---|
Consumers |
None declared |
PROPN
Proper noun @deprecated Use POS_PROPN instead
Producers |
None declared |
---|---|
Consumers |
None declared |
PRT
Particles @deprecated Use POS_PART instead
Producers |
None declared |
---|---|
Consumers |
None declared |
PUNC
Punctuation marks @deprecated Use POS_PUNCT instead
Producers |
None declared |
---|---|
Consumers |
None declared |
PUNCT
Punctuation @deprecated Use POS_PUNCT instead
Producers |
None declared |
---|---|
Consumers |
None declared |
SCONJ
Subordinating conjunction @deprecated Use POS_SCONJ instead
Producers |
None declared |
---|---|
Consumers |
None declared |
SYM
Symbol @deprecated Use POS_SYM instead
Producers |
None declared |
---|---|
Consumers |
None declared |
URL
URL or email address @deprecated Use POS_URL instead
Producers |
None declared |
---|---|
Consumers |
None declared |
V
Verbs @deprecated Use POS_VERB instead
Producers |
None declared |
---|---|
Consumers |
None declared |
VERB
Verb @deprecated Use POS_VERB instead
Producers |
None declared |
---|---|
Consumers |
None declared |
X
Other @deprecated Use POS_X instead
Producers |
None declared |
---|---|
Consumers |
None declared |
NYTArticleMetaData
ArticleMetaData
A document annotation that describes the metadata of a newspaper article.
guid
(Integer)-
The GUID field specifies a (4-byte) integer that is guaranteed to be unique for every document in the corpus.
alternateUrl
(String)-
This field specifies the location on nytimes.com of the article. When present, this URL is preferred to the URL field on articles published on or after April 02, 2006, as the linked page will have richer content.
url
(String)-
This field specifies the location on nytimes.com of the article. The 'Alternative Url' field is preferred to this field on articles published on or after April 02, 2006, as the linked page will have richer content.
publicationDate
(String)-
This field specifies the date of the article’s publication. This field is specified in the format YYYYMMDD’T’HHMMSS where:
-
YYYY is the four-digit year.
-
MM is the two-digit month [01-12].
-
DD is the two-digit day [01-31]. 4. T is a constant value.
-
HH is the two-digit hour [00-23].
-
MM is the two-digit minute-past-the hour [00-59]
-
SS is the two-digit seconds-past-the-minute [00-59]. Please note that values for HH,MM, and SS are not defined for this corpus, that is to day HH,MM, and SS are always defined to be '00'.
-
typesOfMaterial
(StringArray)-
This field specifies a normalized list of terms describing the general editorial category of the article. These tags are algorithmically assigned and manually verified by nytimes.com production staff. Examples Include:
-
REVIEW
-
OBITUARY
-
ANALYSIS
-
headline
(String)-
This field specifies the headline of the article as it appeared in the print edition of the New York Times.
onlineHeadline
(String)-
This field specifies the headline displayed with the article on nytimes.com. Often this differs from the headline used in print.
columnName
(String)-
If the article is part of a regular column, this field specifies the name of that column. Sample Column Names:
-
World News Briefs
-
WEDDINGS
-
The Accessories Channel
-
author
(String)-
This field is based on the normalized byline in the original corpus data: "The Normalized Byline field is the byline normalized to the form (last name, first name)".
descriptors
(StringArray)-
The 'descriptors' field specifies a list of descriptive terms drawn from a normalized controlled vocabulary corresponding to subjects mentioned in the article. These tags are hand-assigned by a team of library scientists working in the New York Times Indexing service. Examples Include:
-
ECONOMIC CONDITIONS AND TRENDS
-
AIRPLANES
-
VIOLINS
-
onlineDescriptors
(StringArray)-
This field specifies a list of descriptors from a normalized controlled vocabulary that correspond to topics mentioned in the article. These tags are algorithmically assigned and manually verified by nytimes.com production staff. Examples Include:
-
Marriages
-
Parks and Other Recreation Areas
-
Cooking and Cookbooks
-
generalOnlineDescriptors
(String)-
The 'general online descriptors' field specifies a list of descriptors that are at a higher level of generality than the other tags associated with the article. These tags are algorithmically assigned and manually verified by nytimes.com production staff. Examples Include:
-
Surfing
-
Venice Biennale
-
Ranches
-
onlineSection
(String)-
This field specifies the section(s) on nytimes.com in which the article is placed. If the article is placed in multiple sections, this field will be specified as a ';' delineated list.
section
(String)-
This field specifies the section of the paper in which the article appears. This is not the name of the section, but rather a letter or number that indicates the section.
taxonomicClassifiers
(StringArray)-
This field specifies a list of taxonomic classifiers that place this article into a hierarchy of articles. The individual terms of each taxonomic classifier are separated with the '/' character. These tags are algorithmically assigned and manually verified by nytimes.com production staff. Examples Include:
-
Top/Features/Travel/Guides/Destinations/North America/United States/Arizona
-
Top/News/U.S./Rockies
-
Top/Opinion
-
Producers |
Nitf (format) |
---|---|
Consumers |
None declared |
Phonetics
PhoneticTranscription
Represents the phonetic transcription of some textual element (usually a Token). Phonetic transcriptions are e.g. generated by transcription processes like Soundex or Metaphone.
transcription
(String)-
The actual transcription
name
(String)-
The name of the transcription process that was used
Producers |
ColognePhoneticTranscriptor DoubleMetaphonePhoneticTranscriptor MetaphonePhoneticTranscriptor SoundexPhoneticTranscriptor |
---|---|
Consumers |
None declared |
ReadabilityScore
Segmentation
The segmentation type system consists of two primary areas: tokenization (including sentences), compound words, and document structure.
The Sentence annotation type is simply a span with no futher attributes.
The Token type may be explicitly linked to a part of speech, lemma, and stem. It is expected that if either of these annotations are present, the token explicitly refers to them. If more than one annotation of such a type, e.g. multiple part-of-speech annotations are present, then it is expected that the token links to the most probable one, while the others are only located at the same offsets.
Additionally, the Token can link into the syntactic constituency structure via the parent feature.
The document structure can be encoded using the Div types. The type Div itself is a generic type representing some element of the document structure more closely specified by the divType attribute. The value of divType corresponds to the tag used in some original document format or to the output of a text segmentation tool. E.g. when reading an HTML document, the divType for a paragraph would be p, whereas in a DocBook XML file, it would instead be para.
For typical structural elements, the subtypes Document, Heading, and Paragrah are available. Document is rarely used, since the basic assumption is that a CAS always represents a document.
Compound
This type represents a decompounding word, i.e.: flowerpot. Each Compound one have at least two Splits.
splits
(FSArray of Split)-
A word that can be decomposed into different parts.
Producers |
|
---|---|
Consumers |
None declared |
Div
Document structure element.
divType
(String)-
No description
id
(String)-
If this unit had an ID in the source format from which it was imported, it may be stored here. IDs are typically not assigned by DKPro Core components. If an ID is present, it should be respected by writers.
Producers |
None declared |
---|---|
Consumers |
None declared |
Type | Description |
---|---|
No description |
|
Document title, section heading, etc. |
|
No description |
JapaneseToken
kana
(String)-
No description
ibo
(String)-
No description
kei
(String)-
No description
dan
(String)-
Specifies the kind of the verb if the current token is a verb. Either it is a vowel stem verb (ichi-dan) or a consonant stem verb (go-dan). Blank if not a verb.
Producers |
|
---|---|
Consumers |
None declared |
Lemma
value
(String)-
No description
NGram
text
(String)-
No description
Producers |
|
---|---|
Consumers |
None declared |
Sentence
id
(String)-
If this unit had an ID in the source format from which it was imported, it may be stored here. IDs are typically not assigned by DKPro Core components. If an ID is present, it should be respected by writers.
Split
This type represents a part of a decompounding word. A Split can be either a CompoundPart or a LinkingMorpheme.
splits
(FSArray of Split)-
Sub-splits of the current split.
Producers |
|
---|---|
Consumers |
None declared |
Type | Description |
---|---|
A CompoundPart represents one fragment from the compounding word. |
|
This type represents a linking morpheme between two CompoundParts. |
Stem
value
(String)-
No description
Producers |
CisStemmer LancasterStemmer MyStemStemmer OpenNlpSnowballStemmer SmileLancasterStemmer SnowballStemmer Nif (format) |
---|---|
Consumers |
Nif (format) |
StopWord
Producers |
None declared |
---|---|
Consumers |
SurfaceForm
This annotation can be used to indicate an alternate surface form. E.g. some corpora consider a normalized form of the text with resolved contractions as the canonical form and only maintain the original surface form as a secondary information. One example is the Conll-U format.
value
(String)-
Alternate surface form.
Producers |
None declared |
---|---|
Consumers |
None declared |
Token
Token is one of the two types commonly produced by a segmenter (the other being Sentence). A Token usually represents a word, although it may be used to represent multiple tightly connected words (e.g. "New York") or parts of a word (e.g. the possessive "'s"). One may choose to split compound words into multiple tokens, e.g. ("CamelCase" -> "Camel", "Case"; "Zauberstab" -> "Zauber", "stab"). Most processing components operate on Tokens, usually within the limits of the surrounding Sentence. E.g. a part-of-speech tagger analyses each Token in a Sentence and assigns a part-of-speech to each Token.
parent
(Annotation)-
the parent of this token. This feature is meant to be used in when the token participates in a constituency parse and then refers to a constituent containing this token. The type of this feature is {@link Annotation} to avoid adding a dependency on the syntax API module.
lemma
(Lemma)-
No description
stem
(Stem)-
No description
pos
(POS)-
No description
morph
(MorphologicalFeatures)-
The morphological feature associated with this token.
id
(String)-
If this unit had an ID in the source format from which it was imported, it may be stored here. IDs are typically not assigned by DKPro Core components. If an ID is present, it should be respected by writers.
form
(TokenForm)-
Potentially normalized form of the token text that should be used instead of the covered text if set.
syntacticFunction
(String)-
No description
order
(Integer)-
Disambiguates the token order for tokens which have the same offsets, e.g. when the contraction "à" is analyzed as two tokens "a" and "a".
Type | Description |
---|---|
No description |
TokenForm
A alternative token text which should be used instead of the covered text if set on a token.
value
(String)-
No description
Producers |
None declared |
---|---|
Consumers |
None declared |
CompoundPart
A CompoundPart represents one fragment from the compounding word. Besides that, it can store other CompoundParts if it can be split again. The way it stores a decompounding word represents a decompounding tree.
Producers |
|
---|---|
Consumers |
None declared |
Document
Producers |
None declared |
---|---|
Consumers |
None declared |
Heading
Document title, section heading, etc.
Producers |
Html (format) HtmlDocument (format) Nif (format) Pdf (format) |
---|---|
Consumers |
Nif (format) |
LinkingMorpheme
This type represents a linking morpheme between two CompoundParts.
Producers |
|
---|---|
Consumers |
None declared |
Paragraph
Producers |
JTokSegmenter ParagraphSplitter Html (format) HtmlDocument (format) Lif (format) Nif (format) Pdf (format) Tei (format) XcesBasicXml (format) XcesXml (format) |
---|---|
Consumers |
Lif (format) Nif (format) Tei (format) XcesBasicXml (format) XcesXml (format) |
Semantics
NamedEntity
Named entities refer e.g. to persons, locations, organizations and so on. They often consist of multiple tokens.
value
(String)-
The class/category of the named entity, e.g. person, location, etc.
identifier
(String)-
Identifier of the named entity, e.g. a reference into a person database.
Producers |
CoreNlpNamedEntityRecognizer LingPipeNamedEntityRecognizer Nlp4JNamedEntityRecognizer OpenNlpNamedEntityRecognizer SemanticFieldAnnotator StanfordNamedEntityRecognizer Conll2002 (format) Conll2003 (format) Conll2012 (format) ConllCoreNlp (format) Lif (format) Nif (format) Tcf (format) Tei (format) |
---|---|
Consumers |
CoreNlpCoreferenceResolver OpenNlpNamedEntityRecognizerTrainer StanfordCoreferenceResolver StanfordNamedEntityRecognizerTrainer Conll2002 (format) Conll2003 (format) Conll2012 (format) ConllCoreNlp (format) Lif (format) Nif (format) Tcf (format) Tei (format) |
Type | Description |
---|---|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
SemArg
The SemArg annotation is attached to semantic arguments of semantic predicates. Semantic arguments are characterized by their semantic role, e.g. Agent, Experiencer, Topic. The semantic role of an argument is related to its semantic type (for communication verbs, the Agent can be a person or an organization, but typically not food).
Producers |
ClearNlpSemanticRoleLabeler MateSemanticRoleLabeler Conll2008 (format) Conll2009 (format) Conll2012 (format) TigerXml (format) |
---|---|
Consumers |
SemArgLink
The SemArgLink type is used to attach SemPred annotations to their respective SemArg annotations while giving each link a role.
role
(String)-
The role which the argument takes. The value depends on the theory being used, e.g. Arg0, Arg1, etc. or Buyer, Seller, etc.
target
(SemArg)-
The target argument.
Producers |
None declared |
---|---|
Consumers |
None declared |
SemPred
One of the predicates of a sentence (often a main verb, but nouns and adjectives can also be predicates). The SemPred annotation can be attached to predicates in a sentence. Semantic predicates express events or situations and take semantic arguments expressing the participants in these events or situations. All forms of main verbs can be annotated with a SemPred. However, there are also many nouns and adjectives that take arguments and can thus be annotated with a SemanticPredicate, e.g. event nouns, such as "suggestion" (with arguments what and by whom), or relational adjectives, such as "proud" (with arguments who and of what).
arguments
(FSArray of SemArgLink)-
The predicate’s arguments.
category
(String)-
A more detailed specification of the predicate type depending on the theory being used, e.g. a frame name.
Producers |
ClearNlpSemanticRoleLabeler MateSemanticRoleLabeler Conll2008 (format) Conll2009 (format) Conll2012 (format) TigerXml (format) |
---|---|
Consumers |
SemanticArgument
The SemanticArgument annotation is attached to semantic arguments of semantic predicates. Semantic arguments are characterized by their semantic role, e.g. Agent, Experiencer, Topic. The semantic role of an argument is related to its semantic type (for communication verbs, the Agent can be a person or an organization, but typically not food). The semantic type of arguments is not yet covered by the SemanticType. @deprecated Use SemArg instead.
role
(String)-
The role which the argument takes. The value depends on the theory being used, e.g. Arg0, Arg1, etc. or Buyer, Seller, etc.
Producers |
None declared |
---|---|
Consumers |
None declared |
SemanticField
The SemanticField is a coarse-grained semantic category that can be attached to nouns, verbs or adjectives. Semantic field information is present e.g. in WordNet as lexicographer file names. Previously, this kind of semantic information has also been called supersenses or semantic types.
value
(String)-
The value or name of the semantic field. Examples of semantic field values are: location, artifact, event, communication, attribute
Producers |
None declared |
---|---|
Consumers |
None declared |
SemanticPredicate
One of the predicates of a sentence (often a main verb, but nouns and adjectives can also be predicates). The SemanticPredicate annotation can be attached to predicates in a sentence. Semantic predicates express events or situations and take semantic arguments expressing the participants in these events ore situations. All forms of main verbs can be annotated with a SemanticPredicate. However, there are also many nouns and adjectives that take arguments and can thus be annotated with a SemanticPredicate, e.g. event nouns, such as "suggestion" (with arguments what and by whom), or relational adjectives, such as "proud" (with arguments who and of what). @deprecated use SemPred instead
category
(String)-
A more detailed specification of the predicate type depending on the theory being used, e.g. a frame name.
arguments
(FSArray of SemanticArgument)-
The predicate’s arguments.
Producers |
None declared |
---|---|
Consumers |
None declared |
StanfordSentimentAnnotation
Stanford CoreNLP Sentiment annotation
veryNegative
(Double)-
Value of veryNegative
negative
(Double)-
Value of negative
neutral
(Double)-
Value of neutral
positive
(Double)-
Value of positive
veryPositive
(Double)-
Value of veryPositive
Producers |
|
---|---|
Consumers |
None declared |
WordSense
value
(String)-
The sense identifier.
Producers |
Conll2012 (format) |
---|---|
Consumers |
Conll2012 (format) |
Animal
Producers |
None declared |
---|---|
Consumers |
None declared |
Cardinal
Producers |
None declared |
---|---|
Consumers |
None declared |
ContactInfo
Producers |
None declared |
---|---|
Consumers |
None declared |
Date
Producers |
None declared |
---|---|
Consumers |
None declared |
Disease
Producers |
None declared |
---|---|
Consumers |
None declared |
Event
Producers |
None declared |
---|---|
Consumers |
None declared |
Fac
Producers |
None declared |
---|---|
Consumers |
None declared |
FacDesc
Producers |
None declared |
---|---|
Consumers |
None declared |
Game
Producers |
None declared |
---|---|
Consumers |
None declared |
Gpe
Producers |
None declared |
---|---|
Consumers |
None declared |
GpeDesc
Producers |
None declared |
---|---|
Consumers |
None declared |
Language
Producers |
None declared |
---|---|
Consumers |
None declared |
Law
Producers |
None declared |
---|---|
Consumers |
None declared |
Location
Producers |
None declared |
---|---|
Consumers |
None declared |
Money
Producers |
None declared |
---|---|
Consumers |
None declared |
Nationality
Producers |
None declared |
---|---|
Consumers |
None declared |
Norp
Producers |
None declared |
---|---|
Consumers |
None declared |
Ordinal
Producers |
None declared |
---|---|
Consumers |
None declared |
OrgDesc
Producers |
None declared |
---|---|
Consumers |
None declared |
Organization
Producers |
None declared |
---|---|
Consumers |
None declared |
PerDesc
Producers |
None declared |
---|---|
Consumers |
None declared |
Percent
Producers |
None declared |
---|---|
Consumers |
None declared |
Person
Producers |
None declared |
---|---|
Consumers |
None declared |
Plant
Producers |
None declared |
---|---|
Consumers |
None declared |
Product
Producers |
None declared |
---|---|
Consumers |
None declared |
ProductDesc
Producers |
None declared |
---|---|
Consumers |
None declared |
Quantity
Producers |
None declared |
---|---|
Consumers |
None declared |
Substance
Producers |
None declared |
---|---|
Consumers |
None declared |
Time
Producers |
None declared |
---|---|
Consumers |
None declared |
WorkOfArt
Producers |
None declared |
---|---|
Consumers |
None declared |
Syntax
Chunk
chunkValue
(String)-
No description
Producers |
OpenNlpChunker TreeTaggerChunker Conll2000 (format) Conll2003 (format) PennTreebankChunked (format) TuebaDZ (format) |
---|---|
Consumers |
Type | Description |
---|---|
adjective chunks |
|
adverb chunks |
|
complex coordinating conjunctions such as "as well (as)" or "rather (than)" |
|
interjection |
|
enumeration symbol |
|
noun chunk (non-recursive noun phrase) |
|
other or outside a chunk |
|
prepositional chunk |
|
verb particle |
|
verb complex |
Constituent
constituentType
(String)-
No description
parent
(Annotation)-
The parent constituent
children
(FSArray of Annotation)-
No description
syntacticFunction
(String)-
No description
Producers |
BerkeleyParser CoreNlpParser OpenNlpParser StanfordParser Lif (format) NegraExport (format) PennTreebankCombined (format) Tei (format) TigerXml (format) |
---|---|
Consumers |
CoreNlpCoreferenceResolver StanfordCoreferenceResolver StanfordDependencyConverter Lif (format) PennTreebankCombined (format) Tei (format) TigerXml (format) |
Type | Description |
---|---|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
This cateory is called PRN in the Penn Treebank tagset. |
|
No description |
|
This type is no longer used and no JCas wrapper is generated for it because on Windows, it conflicts with the reserved device name for printers. |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
Dependency
A dependency relation between two tokens. The dependency annotation begin and end offsets correspond to those of the dependent.
Producers |
ClearNlpParser CoreNlpDependencyParser CoreNlpParser MaltParser MateParser MstParser Nlp4JDependencyParser StanfordDependencyConverter StanfordParser UDPipeParser Conll2006 (format) Conll2008 (format) Conll2009 (format) ConllCoreNlp (format) ConllU (format) Lif (format) Lxf (format) Perseus (format) Tcf (format) |
---|---|
Consumers |
ClearNlpSemanticRoleLabeler MateSemanticRoleLabeler Conll2006 (format) Conll2008 (format) Conll2009 (format) ConllCoreNlp (format) ConllU (format) Lif (format) Lxf (format) Tcf (format) |
Type | Description |
---|---|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
Dependency tree root. |
|
No description |
|
No description |
|
No description |
PennTree
The Penn Treebank-style phrase structure string.
PennTree
(String)-
Contains a Penn Treebank-style representation of a tree.
TransformationNames
(String)-
The name(s) of the transformation(s) that have been performed on the PennTree
Producers |
|
---|---|
Consumers |
TGrep (format) |
ABBREV
Producers |
None declared |
---|---|
Consumers |
None declared |
ACOMP
Producers |
None declared |
---|---|
Consumers |
None declared |
ADJC
adjective chunks
Producers |
None declared |
---|---|
Consumers |
None declared |
ADJP
Producers |
None declared |
---|---|
Consumers |
None declared |
ADVC
adverb chunks
Producers |
None declared |
---|---|
Consumers |
None declared |
ADVCL
Producers |
None declared |
---|---|
Consumers |
None declared |
ADVMOD
Producers |
None declared |
---|---|
Consumers |
None declared |
ADVP
Producers |
None declared |
---|---|
Consumers |
None declared |
AGENT
Producers |
None declared |
---|---|
Consumers |
None declared |
AMOD
Producers |
None declared |
---|---|
Consumers |
None declared |
APPOS
Producers |
None declared |
---|---|
Consumers |
None declared |
ATTR
Producers |
None declared |
---|---|
Consumers |
None declared |
AUX0
Producers |
None declared |
---|---|
Consumers |
None declared |
AUXPASS
Producers |
None declared |
---|---|
Consumers |
None declared |
CC
Producers |
None declared |
---|---|
Consumers |
None declared |
CCOMP
Producers |
None declared |
---|---|
Consumers |
None declared |
COMPLM
Producers |
None declared |
---|---|
Consumers |
None declared |
CONCJ
complex coordinating conjunctions such as "as well (as)" or "rather (than)"
Producers |
None declared |
---|---|
Consumers |
None declared |
CONJ
Producers |
None declared |
---|---|
Consumers |
None declared |
CONJP
Producers |
None declared |
---|---|
Consumers |
None declared |
CONJP
Producers |
None declared |
---|---|
Consumers |
None declared |
CONJ_YET
Producers |
None declared |
---|---|
Consumers |
None declared |
COP
Producers |
None declared |
---|---|
Consumers |
None declared |
CSUBJ
Producers |
None declared |
---|---|
Consumers |
None declared |
CSUBJPASS
Producers |
None declared |
---|---|
Consumers |
None declared |
DEP
Producers |
None declared |
---|---|
Consumers |
None declared |
DET
Producers |
None declared |
---|---|
Consumers |
None declared |
DOBJ
Producers |
None declared |
---|---|
Consumers |
None declared |
EXPL
Producers |
None declared |
---|---|
Consumers |
None declared |
FRAG
Producers |
None declared |
---|---|
Consumers |
None declared |
INFMOD
Producers |
None declared |
---|---|
Consumers |
None declared |
INTJ
interjection
Producers |
None declared |
---|---|
Consumers |
None declared |
INTJ
Producers |
None declared |
---|---|
Consumers |
None declared |
IOBJ
Producers |
None declared |
---|---|
Consumers |
None declared |
LST
enumeration symbol
Producers |
None declared |
---|---|
Consumers |
None declared |
LST
Producers |
None declared |
---|---|
Consumers |
None declared |
MARK
Producers |
None declared |
---|---|
Consumers |
None declared |
MEASURE
Producers |
None declared |
---|---|
Consumers |
None declared |
MWE
Producers |
None declared |
---|---|
Consumers |
None declared |
NAC
Producers |
None declared |
---|---|
Consumers |
None declared |
NC
noun chunk (non-recursive noun phrase)
Producers |
None declared |
---|---|
Consumers |
None declared |
NEG
Producers |
None declared |
---|---|
Consumers |
None declared |
NN
Producers |
None declared |
---|---|
Consumers |
None declared |
NP
Producers |
None declared |
---|---|
Consumers |
None declared |
NPADVMOD
Producers |
None declared |
---|---|
Consumers |
None declared |
NSUBJ
Producers |
None declared |
---|---|
Consumers |
None declared |
NSUBJPASS
Producers |
None declared |
---|---|
Consumers |
None declared |
NUM
Producers |
None declared |
---|---|
Consumers |
None declared |
NUMBER
Producers |
None declared |
---|---|
Consumers |
None declared |
NX
Producers |
None declared |
---|---|
Consumers |
None declared |
O
other or outside a chunk
Producers |
None declared |
---|---|
Consumers |
None declared |
PARATAXIS
Producers |
None declared |
---|---|
Consumers |
None declared |
PARN
This cateory is called PRN in the Penn Treebank tagset. However, PRN is a reserved device name on Window. Thus we had to rename this category. The old PRN type is still present in the DKPro Core type system, but it is deprecated, no longer used, and no JCas classes are generated for it.
Producers |
None declared |
---|---|
Consumers |
None declared |
PARTMOD
Producers |
None declared |
---|---|
Consumers |
None declared |
PC
prepositional chunk
Producers |
None declared |
---|---|
Consumers |
None declared |
PCOMP
Producers |
None declared |
---|---|
Consumers |
None declared |
POBJ
Producers |
None declared |
---|---|
Consumers |
None declared |
POSS
Producers |
None declared |
---|---|
Consumers |
None declared |
POSSESSIVE
Producers |
None declared |
---|---|
Consumers |
None declared |
PP
Producers |
None declared |
---|---|
Consumers |
None declared |
PRECONJ
Producers |
None declared |
---|---|
Consumers |
None declared |
PRED
Producers |
None declared |
---|---|
Consumers |
None declared |
PREDET
Producers |
None declared |
---|---|
Consumers |
None declared |
PREP
Producers |
None declared |
---|---|
Consumers |
None declared |
PREPC
Producers |
None declared |
---|---|
Consumers |
None declared |
PRN
This type is no longer used and no JCas wrapper is generated for it because on Windows, it conflicts with the reserved device name for printers. @deprecated Use PARN instead
Producers |
None declared |
---|---|
Consumers |
None declared |
PRP
Producers |
None declared |
---|---|
Consumers |
None declared |
PRT
Producers |
None declared |
---|---|
Consumers |
None declared |
PRT
verb particle
Producers |
None declared |
---|---|
Consumers |
None declared |
PRT
Producers |
None declared |
---|---|
Consumers |
None declared |
PUNCT
Producers |
None declared |
---|---|
Consumers |
None declared |
PURPCL
Producers |
None declared |
---|---|
Consumers |
None declared |
QP
Producers |
None declared |
---|---|
Consumers |
None declared |
QUANTMOD
Producers |
None declared |
---|---|
Consumers |
None declared |
RCMOD
Producers |
None declared |
---|---|
Consumers |
None declared |
REF
Producers |
None declared |
---|---|
Consumers |
None declared |
REL
Producers |
None declared |
---|---|
Consumers |
None declared |
ROOT
Dependency tree root.
Producers |
None declared |
---|---|
Consumers |
None declared |
ROOT
Producers |
None declared |
---|---|
Consumers |
None declared |
RRC
Producers |
None declared |
---|---|
Consumers |
None declared |
S
Producers |
None declared |
---|---|
Consumers |
None declared |
SBAR
Producers |
None declared |
---|---|
Consumers |
None declared |
SBARQ
Producers |
None declared |
---|---|
Consumers |
None declared |
SINV
Producers |
None declared |
---|---|
Consumers |
None declared |
SQ
Producers |
None declared |
---|---|
Consumers |
None declared |
TMOD
Producers |
None declared |
---|---|
Consumers |
None declared |
UCP
Producers |
None declared |
---|---|
Consumers |
None declared |
VC
verb complex
Producers |
None declared |
---|---|
Consumers |
None declared |
VP
Producers |
None declared |
---|---|
Consumers |
None declared |
WHADJP
Producers |
None declared |
---|---|
Consumers |
None declared |
WHADVP
Producers |
None declared |
---|---|
Consumers |
None declared |
WHNP
Producers |
None declared |
---|---|
Consumers |
None declared |
WHPP
Producers |
None declared |
---|---|
Consumers |
None declared |
X
Producers |
None declared |
---|---|
Consumers |
None declared |
XCOMP
Producers |
None declared |
---|---|
Consumers |
None declared |
XSUBJ
Producers |
None declared |
---|---|
Consumers |
None declared |
Tfidf
Tfidf
Annotates the tf.idf score of a token, stem, or lemma.
tfidfValue
(Double)-
The tf.idf score.
term
(String)-
The string that was used to compute this tf.idf score. If a stem or lemma was used, the covered text of this annotation does not need to be equal to this string.
This string can be used to construct a vector space with the right terms without having to access the indexes again.
Producers |
|
---|---|
Consumers |
None declared |
Topic Modeling
TopicDistribution
An array representing the topic proportions in a document.
TopicProportions
(DoubleArray)-
Each topic’s proportion in the document.
TopicAssignment
(IntegerArray)-
Pointers to topics the document has been assigned to.
Producers |
|
---|---|
Consumers |
DiTop (format) |
Transformation
SofaChangeAnnotation
Encodes an edit operation that can be interpreted by the ApplyChangesAnnotator.
value
(String)-
In case of an "insert" or "replace" operation, this feature indicates the value to be inserted or replaced.
operation
(String)-
Operation to perform: "insert", "replace", "delete"
reason
(String)-
The reason for the change.
Producers |
ApplyChangesAnnotator NorvigSpellingCorrector ReplacementFileNormalizer Tcf (format) |
---|---|
Consumers |
ApplyChangesAnnotator Tcf (format) |
Utility
TimerAnnotation
Used for storing timing information (e.g. for performance testing).
startTime
(Long)-
No description
endTime
(Long)-
No description
name
(String)-
The name of the timer. Used to automatically determine whether this is an upstream or downstream timer.
Producers |
None declared |
---|---|
Consumers |
None declared |
Wikipedia
WikipediaLink
Wikipedia link
LinkType
(String)-
The type of the link, e.g. internal, external, image, …
Target
(String)-
The link target url
Anchor
(String)-
The anchor of the link
Producers |
WikipediaLink (format) |
---|---|
Consumers |
None declared |
Wikipedia (JWPL)
ArticleInfo
Contains basic information about the article.
Authors
(Integer)-
Number of unique authors of this article
Revisions
(Integer)-
Number of revisions of this article.
FirstAppearance
(Long)-
The Timestamp of the first appearance of this article.
LastAppearance
(Long)-
The Timestamp of the last appearance of this article.
Producers |
WikipediaArticleInfo (format) |
---|---|
Consumers |
None declared |
DBConfig
Database configuration for the connection to the database where the CAS data was retrieved.
Host
(String)-
DB Host
DB
(String)-
Database
User
(String)-
Username
Password
(String)-
User password
Language
(String)-
Wikipedia Language Versions
Producers |
WikipediaDiscussion (format) WikipediaLink (format) WikipediaPage (format) WikipediaRevision (format) WikipediaRevisionPair (format) WikipediaTemplateFilteredArticle (format) |
---|---|
Consumers |
None declared |
WikipediaRevision
Represents a revision in Wikipedia.
revisionId
(Integer)-
The ID of the revision.
pageId
(Integer)-
The pageId of the Wikipedia page of this revision.
contributorName
(String)-
The username of the user/contributor who edited this revision.
comment
(String)-
The comment that the editor entered for this revision.
contributorId
(Integer)-
The userId of the user/contributor who created this revision
timestamp
(Long)-
The timestamp of the revision, given in milliseconds since the standard base time (January 1, 1970, 00:00:00 GMT)
minor
(Boolean)-
Whether this revision has been marked as minor edit by its contributor.
Producers |
WikipediaRevision (format) |
---|---|
Consumers |
None declared |
XML
XmlAttribute
uri
(String)-
Namespace URI of the attribute.
localName
(String)-
Local name of the attribute.
value
(String)-
Value of the XML attribute.
qName
(String)-
No description
valueType
(String)-
No description
Producers |
HtmlDocument (format) XmlDocument (format) |
---|---|
Consumers |
XmlDocument (format) |
XmlDocument
XML document
root
(XmlElement)-
Root element of the XML document.
Producers |
HtmlDocument (format) XmlDocument (format) |
---|---|
Consumers |
XmlDocument (format) |
XmlElement
XML element
uri
(String)-
Namespace URI of the element.
localName
(String)-
Local name of the XML element.
attributes
(FSArray of XmlAttribute)-
Array of attributes of the XML element.
children
(FSArray of XmlNode)-
Children of this XML element.
qName
(String)-
No description
Producers |
HtmlDocument (format) XmlDocument (format) |
---|---|
Consumers |
XmlDocument (format) |
XmlNode
Supertype for XmlElements and XmlTextNodes.
parent
(XmlElement)-
No description
Producers |
HtmlDocument (format) XmlDocument (format) |
---|---|
Consumers |
XmlDocument (format) |
Type | Description |
---|---|
XML element |
|
XML text node. |
XmlTextNode
XML text node.
text
(String)-
No description
captured
(Boolean)-
Whether the text node has been added to the document text.
Producers |
HtmlDocument (format) XmlDocument (format) |
---|---|
Consumers |
XmlDocument (format) |
Subtype tables
Type | Description |
---|---|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
Dependency tree root. |
|
No description |
|
No description |
|
No description |
Type | Description |
---|---|
Adjective @deprecated Use POS_ADJ instead |
|
Adposition @deprecated Use POS_ADP instead |
|
Adverb @deprecated Use POS_ADV instead |
|
Determiners and articles. |
|
Auxiliary verb @deprecated Use POS_AUX instead |
|
Numerals @deprecated Use POS_NUM instead |
|
Conjunction @deprecated Use POS_CONJ instead |
|
Determiner @deprecated Use POS_DET instead |
|
Interjection @deprecated Use POS_INTJ instead |
|
Nouns @deprecated Use POS_NOUN instead |
|
Noun @deprecated Use POS_NOUN instead |
|
Numeral @deprecated Use POS_NUM instead |
|
Catch-all for other categories such as abbreviations or foreign words @deprecated Use POS_X instead |
|
Particle @deprecated Use POS_PART instead |
|
Adjective |
|
Adposition |
|
Adverb |
|
Auxiliary verb |
|
Conjunction |
|
Determiner |
|
Interjection |
|
Noun |
|
Numeral |
|
Particle |
|
Pronoun |
|
Proper noun |
|
Punctuation |
|
Subordinating conjunction |
|
Symbol |
|
Verb |
|
Other |
|
Prepositions and postpositions @deprecated Use POS_ADP instead |
|
Pronoun @deprecated Use POS_PRON instead |
|
Pronoun @deprecated Use POS_PRON instead |
|
Proper noun @deprecated Use POS_PROPN instead |
|
Particles @deprecated Use POS_PART instead |
|
Punctuation marks @deprecated Use POS_PUNCT instead |
|
Punctuation @deprecated Use POS_PUNCT instead |
|
Subordinating conjunction @deprecated Use POS_SCONJ instead |
|
Symbol @deprecated Use POS_SYM instead |
|
Verbs @deprecated Use POS_VERB instead |
|
Verb @deprecated Use POS_VERB instead |
|
Other @deprecated Use POS_X instead |
Type | Description |
---|---|
adjective chunks |
|
adverb chunks |
|
complex coordinating conjunctions such as "as well (as)" or "rather (than)" |
|
interjection |
|
enumeration symbol |
|
noun chunk (non-recursive noun phrase) |
|
other or outside a chunk |
|
prepositional chunk |
|
verb particle |
|
verb complex |
Type | Description |
---|---|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
This cateory is called PRN in the Penn Treebank tagset. |
|
No description |
|
This type is no longer used and no JCas wrapper is generated for it because on Windows, it conflicts with the reserved device name for printers. |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
Type | Description |
---|---|
at-mention (indicates another user as a recipient of a tweet) @deprecated Use POS_AT instead |
|
discourse marker, indications of continuation of a message across multiple tweets @deprecated Use POS_DM instead |
|
emoticon @deprecated Use POS_EMO instead |
|
Hashtag (indicates topic/category for tweet) @deprecated Use POS_HASH instead |
|
proper noun + verbal @deprecated Use POS_INT instead |
|
URL or email address @deprecated Use POS_URL instead |
Type | Description |
---|---|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
|
No description |
Type | Description |
---|---|
A document annotation that describes the metadata of a newspaper article. |
|
Marks the beginning of a chain. |
Type | Description |
---|---|
A CompoundPart represents one fragment from the compounding word. |
|
This type represents a linking morpheme between two CompoundParts. |
Type | Description |
---|---|
No description |
|
No description |
Type | Description |
---|---|
No description |
|
Document title, section heading, etc. |
|
No description |
Type | Description |
---|---|
The DocumentMetaData annotation stores information about a single processed document. |
Type | Description |
---|---|
Discourse relation |
Type | Description |
---|---|
No description |
|
No description |
Type | Description |
---|---|
Implicit discourse relation |
Type | Description |
---|---|
No description |
Type | Description |
---|---|
Common noun @deprecated Use POS_NOUN instead |
|
nominal + verbal @deprecated Use POS_NNV instead |
|
Proper noun @deprecated Use POS_PROPN instead |
|
proper noun + verbal @deprecated Use POS_NPV instead |
Type | Description |
---|---|
at-mention (indicates another user as a recipient of a tweet) |
|
discourse marker, indications of continuation of a message across multiple tweets |
|
emoticon |
|
Hashtag (indicates topic/category for tweet) |
|
proper noun + verbal |
|
URL or email address |
Type | Description |
---|---|
nominal + verbal |
|
proper noun + verbal |
Type | Description |
---|---|
The SemArgLink type is used to attach SemPred annotations to their respective SemArg annotations while giving each link a role. |
|
Description of an individual tag. |
|
No description |
Type | Description |
---|---|
XML element |
|
XML text node. |