public class TextFormatVectorizerUtils extends Object
Constructor and Description |
---|
TextFormatVectorizerUtils() |
Modifier and Type | Method and Description |
---|---|
static void |
convertMalletEmbeddingsToBinary(File malletEmbeddings,
boolean aCaseless,
Locale aLocale,
File targetFile)
Read a (compressed) Mallet embeddings file (in text format) and convert it into the binary format using
BinaryWordVectorUtils . |
static void |
convertMalletEmbeddingsToBinary(File malletEmbeddings,
File targetFile)
Read a (compressed) Mallet embeddings file (in text format) and convert it into the binary format using
BinaryWordVectorUtils . |
static Map<String,float[]> |
readEmbeddingFileTxt(File file,
boolean hasHeader)
Read an embeddings file in text format.
|
static Map<String,float[]> |
readEmbeddingFileTxt(InputStream inputStream,
boolean hasHeader)
Read embeddings in text format from an InputStream.
|
public static Map<String,float[]> readEmbeddingFileTxt(File file, boolean hasHeader) throws IOException
If hasHeader is set to true, the first line is expected to contain the size and dimensionality of the vectors. This is typically true for files generated by Word2Vec (in text format).
file
- the input filehasHeader
- if true, read size and dimensionality from the first lineMap<String, float[]>
mapping each token to a vector.IOException
- if the input file cannot be readreadEmbeddingFileTxt(InputStream, boolean)
public static Map<String,float[]> readEmbeddingFileTxt(InputStream inputStream, boolean hasHeader) throws IOException
<token> <value1> <value2> ...
.inputStream
- an InputStream
hasHeader
- if true, read size and dimensionality from the first lineMap<String, float[]>
mapping each token to a vector.IOException
- if the input file cannot be readpublic static void convertMalletEmbeddingsToBinary(File malletEmbeddings, File targetFile) throws IOException
BinaryWordVectorUtils
.malletEmbeddings
- a File
holding embeddings in text formattargetFile
- the output File
IOException
- if an I/O error occurs.public static void convertMalletEmbeddingsToBinary(File malletEmbeddings, boolean aCaseless, Locale aLocale, File targetFile) throws IOException
BinaryWordVectorUtils
.malletEmbeddings
- a File
holding embeddings in text formataCaseless
- if true, all input tokens are expected to be caselessaLocale
- the Locale
to usetargetFile
- the output File
IOException
- if an I/O error occurs.Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.