public class ExtractReuters extends Object
This is an adaption of the ExtractReuters
class in the lucene-benchmarks
package.
Constructor and Description |
---|
ExtractReuters() |
Modifier and Type | Method and Description |
---|---|
static List<ReutersDocument> |
extract(Path reutersDir)
Read all the SGML file in the given directory.
|
static List<ReutersDocument> |
extractFile(InputStream sgmFile,
URI uri)
Read the documents out of a single file.
|
public static List<ReutersDocument> extract(Path reutersDir) throws IOException, ParseException
reutersDir
- the directory that contains the Reuters SGML files.ReutersDocument
sIOException
- if any of the files cannot be read.ParseException
- if there was a problem parsing a datepublic static List<ReutersDocument> extractFile(InputStream sgmFile, URI uri) throws IOException, ParseException
sgmFile
- an InputStream
of a Reuters SGML file.uri
- an URI
pointing to the original SGML file locationReutersDocument
s extracted from the input streamIOException
- if any of the files cannot be read.ParseException
- if there was a problem parsing a dateCopyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.