public class Pdf2CasConverter extends PdfLayoutEventStripper
PdfLayoutEventStripper.Style, PdfLayoutEventStripper.Values
charactersByArticle
Constructor and Description |
---|
Pdf2CasConverter() |
Modifier and Type | Method and Description |
---|---|
protected void |
endDocument(org.apache.pdfbox.pdmodel.PDDocument aPdf)
This method is available for subclasses of this class.
|
protected void |
endPage(int aStartPage,
int aEndPage,
int aCurrentPage,
org.apache.pdfbox.pdmodel.PDPage page)
End a page.
|
protected void |
endRegion(PdfLayoutEventStripper.Style aStyle)
End a region.
|
String |
getHeadingType() |
String |
getParagraphType() |
Trie<String> |
getSubstitutionTable() |
protected void |
processLineSeparator() |
protected void |
processWordSeparator() |
void |
setHeadingType(String aHeadingType) |
void |
setParagraphType(String aParagraphType) |
void |
setSubstitutionTable(Trie<String> aSubstitutionTable) |
protected void |
startDocument(org.apache.pdfbox.pdmodel.PDDocument aPdf)
This method is available for subclasses of this class.
|
protected void |
startPage(int aFirstPage,
int aLastPage,
int aCurrentPage,
org.apache.pdfbox.pdmodel.PDPage page)
Start a new page.
|
protected void |
startRegion(PdfLayoutEventStripper.Style aStyle)
Start a new region.
|
protected void |
writeCharacters(TextPosition aText)
Write the string to the output stream.
|
void |
writeText(org.apache.uima.cas.CAS aCas,
InputStream aIs) |
getCharactersByArticle, getCurrentPageNo, getEndPage, getStartPage, getStyle, processArticle, processPage, processPages, processTextPosition, setEndPage, setShouldSeparateByBeads, setStartPage, setSuppressDuplicateOverlappingText, shouldSeparateByBeads, shouldSuppressDuplicateOverlappingText, writeText
public Pdf2CasConverter() throws IOException
IOException
public void writeText(org.apache.uima.cas.CAS aCas, InputStream aIs) throws IOException
IOException
protected void startDocument(org.apache.pdfbox.pdmodel.PDDocument aPdf) throws IOException
PdfLayoutEventStripper
startDocument
in class PdfLayoutEventStripper
aPdf
- The PDF document that is being processed.IOException
- If an IO error occurs.protected void endDocument(org.apache.pdfbox.pdmodel.PDDocument aPdf) throws IOException
PdfLayoutEventStripper
endDocument
in class PdfLayoutEventStripper
aPdf
- The PDF document that is being processed.IOException
- If an IO error occurs.protected void processLineSeparator() throws IOException
processLineSeparator
in class PdfLayoutEventStripper
IOException
protected void processWordSeparator() throws IOException
processWordSeparator
in class PdfLayoutEventStripper
IOException
protected void startPage(int aFirstPage, int aLastPage, int aCurrentPage, org.apache.pdfbox.pdmodel.PDPage page) throws IOException
PdfLayoutEventStripper
startPage
in class PdfLayoutEventStripper
aFirstPage
- first page.aLastPage
- last page.aCurrentPage
- current page.page
- The page we are about to process.IOException
- If there is any error writing to the stream.protected void endPage(int aStartPage, int aEndPage, int aCurrentPage, org.apache.pdfbox.pdmodel.PDPage page) throws IOException
PdfLayoutEventStripper
endPage
in class PdfLayoutEventStripper
aStartPage
- first page.aEndPage
- last page.aCurrentPage
- current page.page
- The page we are about to process.IOException
- If there is any error writing to the stream.protected void startRegion(PdfLayoutEventStripper.Style aStyle) throws IOException
PdfLayoutEventStripper
startRegion
in class PdfLayoutEventStripper
aStyle
- the style.IOException
- If there is any error writing to the stream.protected void endRegion(PdfLayoutEventStripper.Style aStyle) throws IOException
PdfLayoutEventStripper
endRegion
in class PdfLayoutEventStripper
aStyle
- the style.IOException
- If there is any error writing to the stream.protected void writeCharacters(TextPosition aText) throws IOException
PdfLayoutEventStripper
writeCharacters
in class PdfLayoutEventStripper
aText
- The text to write to the stream.IOException
- If there is an error when writing the text.public String getParagraphType()
public void setParagraphType(String aParagraphType)
public String getHeadingType()
public void setHeadingType(String aHeadingType)
Copyright © 2007–2018 Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt. All rights reserved.