public class DiTopWriter extends JCasFileWriter_ImplBase
MalletLdaTopicModelInferencer
using the same model.JCasFileWriter_ImplBase.NamedOutputStream
Modifier and Type | Field and Description |
---|---|
protected boolean |
appendConfig |
protected String[] |
collectionValues |
protected boolean |
collectionValuesExactMatch |
protected Set<String> |
collectionValuesSet |
protected String |
corpusName |
protected File |
modelLocation |
static String |
PARAM_APPEND_CONFIG
If set to true, the new corpus will be appended to an existing config file.
|
static String |
PARAM_COLLECTION_VALUES
If set, only documents with one of the listed collection IDs are written, all others are
ignored.
|
static String |
PARAM_COLLECTION_VALUES_EXACT_MATCH
If true (default), only write documents with collection ids matching one of the collection
values exactly.
|
static String |
PARAM_CORPUS_NAME
The corpus name is used to name the corresponding sub-directory and will be set in the
configuration file.
|
static String |
PARAM_MAX_TOPIC_WORDS
The maximum number of topic words to extract.
|
static String |
PARAM_MODEL_LOCATION
A Mallet file storing a serialized
ParallelTopicModel . |
static String |
PARAM_TARGET_LOCATION
Directory in which to store output files.
|
protected File |
targetLocation |
protected BufferedWriter |
writerDocTopic |
JAR_PREFIX, PARAM_COMPRESSION, PARAM_ESCAPE_DOCUMENT_ID, PARAM_OVERWRITE, PARAM_SINGULAR_TARGET, PARAM_STRIP_EXTENSION, PARAM_USE_DOCUMENT_ID
Constructor and Description |
---|
DiTopWriter() |
Modifier and Type | Method and Description |
---|---|
void |
collectionProcessComplete() |
protected String |
expandCollectionId(String collectionId)
This method checks whether any of the specified collection values contains the given String.
|
protected String |
getCollectionId(org.apache.uima.jcas.JCas aJCas)
Extract the collection id from the JCas.
|
protected String |
getDocumentId(org.apache.uima.jcas.JCas aJCas)
Extract the document id from the JCas.
|
void |
initialize(org.apache.uima.UimaContext context) |
void |
process(org.apache.uima.jcas.JCas aJCas) |
protected void |
writeDocTopic(TopicDistribution distribution,
String docName,
String collectionId) |
getCompressionMethod, getOutputStream, getOutputStream, getRelativePath, getTargetLocation, isStripExtension, isUseDocumentId
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
public static final String PARAM_MAX_TOPIC_WORDS
public static final String PARAM_MODEL_LOCATION
ParallelTopicModel
.protected File modelLocation
public static final String PARAM_CORPUS_NAME
protected String corpusName
public static final String PARAM_TARGET_LOCATION
protected File targetLocation
public static final String PARAM_APPEND_CONFIG
protected boolean appendConfig
public static final String PARAM_COLLECTION_VALUES
protected String[] collectionValues
public static final String PARAM_COLLECTION_VALUES_EXACT_MATCH
protected boolean collectionValuesExactMatch
protected BufferedWriter writerDocTopic
public void initialize(org.apache.uima.UimaContext context) throws org.apache.uima.resource.ResourceInitializationException
initialize
in interface org.apache.uima.analysis_component.AnalysisComponent
initialize
in class org.apache.uima.fit.component.JCasConsumer_ImplBase
org.apache.uima.resource.ResourceInitializationException
public void process(org.apache.uima.jcas.JCas aJCas) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process
in class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
protected void writeDocTopic(TopicDistribution distribution, String docName, String collectionId) throws IOException
IOException
public void collectionProcessComplete() throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
collectionProcessComplete
in interface org.apache.uima.analysis_component.AnalysisComponent
collectionProcessComplete
in class JCasFileWriter_ImplBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
protected String getCollectionId(org.apache.uima.jcas.JCas aJCas)
DocumentMetaData.getCollectionId()
, but
this method can be overwritten to select a different source for the collection id.aJCas
- the JCas.protected String expandCollectionId(String collectionId)
collectionId
- the collection ID.collectionValuesSet
that contains the (lowercased)
collectionId
or the input collectionId
.protected String getDocumentId(org.apache.uima.jcas.JCas aJCas) throws IllegalStateException
DocumentMetaData.getDocumentId()
, but
this method can be overwritten to select a different source for the document id.aJCas
- the JCas.IllegalStateException
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.