LuceneIndexer (DKPro Core 1.9.0 API)

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- de.tudarmstadt.ukp.dkpro.core.decompounding.web1t.LuceneIndexer

```
public class LuceneIndexer
extends Object
```
Index the Google Web1T corpus in Lucene. All values are stored in the index. The fields are * gram: The n-gram * freq: The frequency of the n-gram in the corpus Note: This was only tested with the german corpus of Web1T. The english one is much bigger and Lucene can only handle Integer.MAX_VALUE (2 147 483 647) documents per index. Each n-gram is a document. In the /bin folder is a script file to run the indexer. Simple run: ./bin/web1TLuceneIndexer.sh \ --web1t PATH/TO/FOLDER/WITH/ALL/EXTRACTED/N-GRAM/FILES \ --outputPath PAHT/TO/LUCENE/INDEX/FOLDER

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected static class LuceneIndexer.Worker
A Worker thread.

Constructor Summary

Constructors
Constructor and Description
`LuceneIndexer(File aWeb1tFolder, File aOutputPath)` Constructor to create a indexer instance
`LuceneIndexer(File aWeb1tFolder, File aOutputPath, int aIndexes)` Constructor to create a indexer instance

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Dictionary`	`getDictionary()`
`void`	`index()` Create the index.
`void`	`setDictionary(Dictionary aDictionary)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - LuceneIndexer
```
public LuceneIndexer(File aWeb1tFolder,
                     File aOutputPath)
```
    Constructor to create a indexer instance
    
    Parameters:
    
    aWeb1tFolder - The folder with all extracted n-gram files
    
    aOutputPath - The lucene index folder
  - LuceneIndexer
```
public LuceneIndexer(File aWeb1tFolder,
                     File aOutputPath,
                     int aIndexes)
```
    Constructor to create a indexer instance
    
    Parameters:
    
    aWeb1tFolder - The folder with all extracted n-gram files
    
    aOutputPath - The lucene index folder
    
    aIndexes - The number of indexes
- Method Detail
  - index
```
public void index()
           throws FileNotFoundException,
                  InterruptedException
```
    Create the index. This is a very long running function. It will output some information on stdout.
    
    Throws:
    
    FileNotFoundException - if the index could not be found.
    
    InterruptedException - if threads were interrupted.
  - getDictionary
```
public Dictionary getDictionary()
```
  - setDictionary
```
public void setDictionary(Dictionary aDictionary)
```

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.