Search code examples
javatabula

How can tabula (JAR) be called from Java?


Tabula looks like a great tool for extracting tabular data from PDFs. There are plenty of examples of how to call it from the command line or use it in Python but there doesn't seem to be any documentation for use in Java. Does anyone have a worked example?

Note, tabula does provide source code but it seems confused between versions. For example, the example on GitHub references a TableExtractor class which does not seem to exist in the JAR.

https://github.com/tabulapdf/tabula-java


Solution

  • you can use the following code to call tabula from java, hope this helps

      public static void main(String[] args) throws IOException {
        final String FILENAME="../test.pdf";
    
        PDDocument pd = PDDocument.load(new File(FILENAME));
    
        int totalPages = pd.getNumberOfPages();
        System.out.println("Total Pages in Document: "+totalPages);
    
        ObjectExtractor oe = new ObjectExtractor(pd);
        SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
        Page page = oe.extract(1);
    
        // extract text from the table after detecting
        List<Table> table = sea.extract(page);
        for(Table tables: table) {
            List<List<RectangularTextContainer>> rows = tables.getRows();
    
            for(int i=0; i<rows.size(); i++) {
    
                List<RectangularTextContainer> cells = rows.get(i);
    
                for(int j=0; j<cells.size(); j++) {
                    System.out.print(cells.get(j).getText()+"|");
                }
    
               // System.out.println();
            }
        }
    
    }