public class MalletLdaTopicModelTrainer extends MalletModelTrainer
Instance
s before estimating the model, using a ParallelTopicModel
.
Set MalletModelTrainer.PARAM_TOKEN_FEATURE_PATH
to define what is considered as a token (Tokens, Lemmas, etc.).
Set MalletModelTrainer.PARAM_COVERING_ANNOTATION_TYPE
to define what is considered a document (sentences, paragraphs, etc.).
JCasFileWriter_ImplBase.NamedOutputStream
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_ALPHA_SUM
The sum of alphas over all topics.
|
static String |
PARAM_BETA
Beta for a single dimension of the Dirichlet prior.
|
static String |
PARAM_BURNIN_PERIOD
The number of iterations before hyper-parameter optimization begins.
|
static String |
PARAM_DISPLAY_INTERVAL
The interval in which to display the estimated topics.
|
static String |
PARAM_DISPLAY_N_TOPIC_WORDS
The number of top words to display during estimation.
|
static String |
PARAM_N_ITERATIONS
The number of iterations during model estimation.
|
static String |
PARAM_N_TOPICS
The number of topics to estimate.
|
static String |
PARAM_OPTIMIZE_INTERVAL
Interval for optimizing Dirichlet hyper-parameters.
|
static String |
PARAM_RANDOM_SEED
Set random seed.
|
static String |
PARAM_SAVE_INTERVAL
Define how frequently a serialized model is saved to disk during estimation.
|
static String |
PARAM_USE_SYMMETRIC_ALPHA
Use a symmetric alpha value during model estimation? Default: false.
|
PARAM_COVERING_ANNOTATION_TYPE, PARAM_FILTER_REGEX, PARAM_FILTER_REGEX_REPLACEMENT, PARAM_LOWERCASE, PARAM_MIN_TOKEN_LENGTH, PARAM_NUM_THREADS, PARAM_STOPWORDS_FILE, PARAM_STOPWORDS_REPLACEMENT, PARAM_TOKEN_FEATURE_PATH, PARAM_USE_CHARACTERS
JAR_PREFIX, PARAM_COMPRESSION, PARAM_ESCAPE_DOCUMENT_ID, PARAM_OVERWRITE, PARAM_SINGULAR_TARGET, PARAM_STRIP_EXTENSION, PARAM_TARGET_LOCATION, PARAM_USE_DOCUMENT_ID
Constructor and Description |
---|
MalletLdaTopicModelTrainer() |
Modifier and Type | Method and Description |
---|---|
void |
collectionProcessComplete() |
getInstanceList, getNumThreads, initialize, process
getCompressionMethod, getOutputStream, getOutputStream, getRelativePath, getTargetLocation, isStripExtension, isUseDocumentId
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
public static final String PARAM_N_TOPICS
public static final String PARAM_N_ITERATIONS
public static final String PARAM_BURNIN_PERIOD
public static final String PARAM_OPTIMIZE_INTERVAL
public static final String PARAM_RANDOM_SEED
public static final String PARAM_SAVE_INTERVAL
public static final String PARAM_USE_SYMMETRIC_ALPHA
public static final String PARAM_DISPLAY_INTERVAL
public static final String PARAM_DISPLAY_N_TOPIC_WORDS
public static final String PARAM_ALPHA_SUM
Another recommended value is 50 / T (number of topics).
public static final String PARAM_BETA
public void collectionProcessComplete() throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
collectionProcessComplete
in interface org.apache.uima.analysis_component.AnalysisComponent
collectionProcessComplete
in class JCasFileWriter_ImplBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.