Search code examples
javatabula

Extracted pdf text is not getting displayed in console


I am trying to extract the pdf text using Tabula. But the code has no errors but when i run the extracted pdf text does not get displayed in console. Could some one help.

I have been using PDFBox and after doing some research, i have found that tabula is new and wanted to try it.

File file = new File(pdfFilePath);
PDDocument document = PDDocument.load(file);
ObjectExtractor oe = new ObjectExtractor(document);
Page page = oe.extract(1) //1st page
TextStripper textStripper = new TextStripper(document,1);
System.out.println(textStripper.getText(document));

output of pdf text

Solution

  • You are not using the page variable. Try the following code.

    File file = new File(pdfFilePath);
    PDDocument document = PDDocument.load(file);
    ObjectExtractor oe = new ObjectExtractor(document);
    Page page = oe.extract(1); // 1st page
    
    for (TextElement textElement: page.getText()) {
      System.out.print(textElement.getText());
    }