public abstract class MalletModelTrainer extends JCasFileWriter_ImplBase
It creates a Mallet InstanceList from the input documents so that inheriting estimators
can create a model, typically implemented by overriding the
JCasFileWriter_ImplBase.collectionProcessComplete() method.
MalletEmbeddingsTrainer,
MalletLdaTopicModelTrainerJCasFileWriter_ImplBase.NamedOutputStream| Modifier and Type | Field and Description |
|---|---|
static String |
PARAM_COVERING_ANNOTATION_TYPE
If specified, the text contained in the given segmentation type annotations are fed as
separate units ("documents") to the topic model estimator e.g.
|
static String |
PARAM_FILTER_REGEX
Regular expression of tokens to be filtered.
|
static String |
PARAM_FILTER_REGEX_REPLACEMENT
Value with which tokens matching the regular expression are replaced.
|
static String |
PARAM_LOWERCASE
If set to true (default: false), all tokens are lowercased.
|
static String |
PARAM_MIN_TOKEN_LENGTH
Ignore tokens (or any other annotation type, as specified by
PARAM_TOKEN_FEATURE_PATH) that are shorter than the given value. |
static String |
PARAM_NUM_THREADS
The number of threads to use during model estimation.
|
static String |
PARAM_STOPWORDS_FILE
The location of the stopwords file.
|
static String |
PARAM_STOPWORDS_REPLACEMENT
If set, stopwords found in the
PARAM_STOPWORDS_FILE location are not removed, but
replaced by the given string (e.g. |
static String |
PARAM_TOKEN_FEATURE_PATH
The annotation type to use as input tokens for the model estimation.
|
static String |
PARAM_USE_CHARACTERS
If true (default: false), estimate character embeddings.
|
JAR_PREFIX, PARAM_COMPRESSION, PARAM_ESCAPE_FILENAME, PARAM_OVERWRITE, PARAM_SINGULAR_TARGET, PARAM_STRIP_EXTENSION, PARAM_TARGET_LOCATION, PARAM_USE_DOCUMENT_ID| Constructor and Description |
|---|
MalletModelTrainer() |
| Modifier and Type | Method and Description |
|---|---|
cc.mallet.types.InstanceList |
getInstanceList() |
protected int |
getNumThreads() |
void |
initialize(org.apache.uima.UimaContext context) |
void |
process(org.apache.uima.jcas.JCas aJCas) |
collectionProcessComplete, getCompressionMethod, getOutputStream, getOutputStream, getRelativePath, getTargetLocation, isStripExtension, isUseDocumentIdgetRequiredCasInterface, processgetCasInstancesRequired, hasNext, nextpublic static final String PARAM_TOKEN_FEATURE_PATH
de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token/lemma/valuepublic static final String PARAM_NUM_THREADS
ComponentParameters.computeNumThreads(int).
Warning: do not set this to more than 1 when using very small (test) data sets on
MalletEmbeddingsTrainer! This might prevent the process from terminating.
public static final String PARAM_MIN_TOKEN_LENGTH
PARAM_TOKEN_FEATURE_PATH) that are shorter than the given value.public static final String PARAM_COVERING_ANNOTATION_TYPE
de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.sentence. Text that is not within
such annotations is ignored.
By default, the full text is used as a document.
public static final String PARAM_USE_CHARACTERS
PARAM_TOKEN_FEATURE_PATH is
ignored.public static final String PARAM_LOWERCASE
public static final String PARAM_STOPWORDS_FILE
public static final String PARAM_STOPWORDS_REPLACEMENT
PARAM_STOPWORDS_FILE location are not removed, but
replaced by the given string (e.g. STOP).public static final String PARAM_FILTER_REGEX
public static final String PARAM_FILTER_REGEX_REPLACEMENT
public void initialize(org.apache.uima.UimaContext context)
throws org.apache.uima.resource.ResourceInitializationException
initialize in interface org.apache.uima.analysis_component.AnalysisComponentinitialize in class org.apache.uima.fit.component.JCasConsumer_ImplBaseorg.apache.uima.resource.ResourceInitializationExceptionpublic void process(org.apache.uima.jcas.JCas aJCas)
throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process in class org.apache.uima.analysis_component.JCasAnnotator_ImplBaseorg.apache.uima.analysis_engine.AnalysisEngineProcessExceptionprotected int getNumThreads()
public cc.mallet.types.InstanceList getInstanceList()
Copyright © 2007–2019 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.