public class BinaryCasWriter extends JCasFileWriter_ImplBase
Write CAS in one of the UIMA binary formats.
All the supported formats except 6+
can also be loaded and saved via the UIMA
CasIOUtils
.
Format | Description | Type system on load | CAS Addresses preserved |
---|---|---|---|
SERIALIZED or S |
CAS structures are dumped to disc as they are using Java serialization (CASSerializer
). Because these structures are pre-allocated in memory at larger sizes than what is actually
required, files in this format may be larger than necessary. However, the CAS addresses of
feature structures are preserved in this format. When the data is loaded back into a CAS, it must
have been initialized with the same type system as the original CAS. |
must be the same | yes |
SERIALIZED_TSI or S+ |
CAS structures are dumped to disc as they are using Java serialization as in form 0, but
now using the CASCompleteSerializer which includes CAS metadata like type system and
index repositories. |
is reinitialized | yes |
BINARY or 0 |
CAS structures are dumped to disc as they are using Java serialization (CASSerializer
). This is basically the same as format S but includes a UIMA header and can be read
using Serialization.deserializeCAS(org.apache.uima.cas.CAS, java.io.InputStream) . |
must be the same | yes |
BINARY_TSI or 0 |
The same as BINARY_TSI , except that the type system and index configuration
are also stored in the file. However, lenient loading or reinitalizing the CAS with this
information is presently not supported. |
must be the same | yes |
COMPRESSED or 4 |
UIMA binary serialization saving all feature structures (reachable or not). This format internally uses gzip compression and a binary representation of the CAS, making it much more efficient than format 0. | must be the same | yes |
COMPRESSED_FILTERED or 6 |
UIMA binary serialization as format 4, but saving only reachable feature structures. | must be the same | no |
6+ |
This is a legacy format specific to DKPro Core. Since UIMA 2.9.0, COMPRESSED_FILTERED_TSI
is supported and should be used instead of this format. UIMA binary serialization as format 6,
but also contains the type system definition. This allows the BinaryCasReader to load data
leniently into a CAS that has been initialized with a different type system. |
lenient loading | no |
COMPRESSED_FILTERED_TS |
Same as COMPRESSED_FILTERED , but also contains the type system definition. This
allows the BinaryCasReader to load data leniently into a CAS that has been initialized
with a different type system. |
lenient loading | no |
COMPRESSED_FILTERED_TSI |
Default. UIMA binary serialization as format 6, but also contains the type system
definition and index definitions. This allows the BinaryCasReader to load data leniently
into a CAS that has been initialized with a different type system. |
lenient loading | no |
JCasFileWriter_ImplBase.NamedOutputStream
Modifier and Type | Field and Description |
---|---|
static String |
AUTO |
static String |
PARAM_FILENAME_EXTENSION
The file extension.
|
static String |
PARAM_FORMAT |
static String |
PARAM_TYPE_SYSTEM_LOCATION
Location to write the type system to.
|
JAR_PREFIX, PARAM_COMPRESSION, PARAM_ESCAPE_DOCUMENT_ID, PARAM_OVERWRITE, PARAM_SINGULAR_TARGET, PARAM_STRIP_EXTENSION, PARAM_TARGET_LOCATION, PARAM_USE_DOCUMENT_ID
Constructor and Description |
---|
BinaryCasWriter() |
Modifier and Type | Method and Description |
---|---|
void |
initialize(org.apache.uima.UimaContext aContext) |
void |
process(org.apache.uima.jcas.JCas aJCas) |
collectionProcessComplete, getCompressionMethod, getOutputStream, getOutputStream, getRelativePath, getTargetLocation, isStripExtension, isUseDocumentId
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
public static final String AUTO
public static final String PARAM_TYPE_SYSTEM_LOCATION
typesystem.ser
.
JCasFileWriter_ImplBase.PARAM_COMPRESSION
parameter has no effect on the
type system. Instead, if the type system file should be compressed or not is detected from
the file name extension (e.g. ".gz").
SerializedCasReader
can currently not
read such files. Use this only if you really know what you are doing.
public static final String PARAM_FORMAT
public static final String PARAM_FILENAME_EXTENSION
AUTO
, then the extension will be chosen based
on the default extension specified by the UIMA SerialFormat
class. However, this
only works when using the new long format names (e.g. COMPRESSED_FILTERED_TSI
).
When using the old short names (e.g. 6
), the default extension .bin is
used.public void initialize(org.apache.uima.UimaContext aContext) throws org.apache.uima.resource.ResourceInitializationException
initialize
in interface org.apache.uima.analysis_component.AnalysisComponent
initialize
in class org.apache.uima.fit.component.JCasConsumer_ImplBase
org.apache.uima.resource.ResourceInitializationException
public void process(org.apache.uima.jcas.JCas aJCas) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process
in class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.