public class TextFormatVectorizer extends Object implements Vectorizer
Modifier and Type | Method and Description |
---|---|
boolean |
contains(String token)
True if the token is known by the vectorizer.
|
int |
dimensions()
The dimensionality of the embeddings
|
boolean |
isCaseless() |
static Vectorizer |
load(File f)
Load a text-format embeddings file (assuming no header line).
|
static Vectorizer |
load(File embeddingsFile,
boolean hasHeaderLine)
Load a text-format embeddings file.
|
int |
size()
The total number of known tokens.
|
float[] |
unknownVector()
The vector for unknown tokens.
|
float[] |
vectorize(String token)
Get the vector for a token.
|
public static Vectorizer load(File f) throws IOException
f
- the File
containing the embeddings in text formatTextFormatVectorizer
IOException
- if an I/O error occurspublic static Vectorizer load(File embeddingsFile, boolean hasHeaderLine) throws IOException
embeddingsFile
- the File
containing the embeddings in text formathasHeaderLine
- if true, the first line in the file is expected to be a header lineTextFormatVectorizer
IOException
- if an I/O error occurspublic float[] vectorize(String token)
Vectorizer
Vectorizer.unknownVector()
.vectorize
in interface Vectorizer
token
- a token Stringpublic boolean contains(String token)
Vectorizer
contains
in interface Vectorizer
token
- a token Stringpublic float[] unknownVector()
Vectorizer
unknownVector
in interface Vectorizer
public int dimensions()
Vectorizer
dimensions
in interface Vectorizer
public int size()
Vectorizer
size
in interface Vectorizer
public boolean isCaseless()
isCaseless
in interface Vectorizer
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.