I am trying to extract the pdf text using Tabula. But the code has no errors but when i run the extracted pdf text does not get displayed in console. Could some one help.
I have been using PDFBox and after doing some research, i have found that tabula is new and wanted to try it.
File file = new File(pdfFilePath);
PDDocument document = PDDocument.load(file);
ObjectExtractor oe = new ObjectExtractor(document);
Page page = oe.extract(1) //1st page
TextStripper textStripper = new TextStripper(document,1);
System.out.println(textStripper.getText(document));
output of pdf text
You are not using the page variable. Try the following code.
File file = new File(pdfFilePath);
PDDocument document = PDDocument.load(file);
ObjectExtractor oe = new ObjectExtractor(document);
Page page = oe.extract(1); // 1st page
for (TextElement textElement: page.getText()) {
System.out.print(textElement.getText());
}