Search code examples
javapdfbox

PDFbox library page iteration


I implemented a method that captures particular area from every single PDF page in a PDF Document, so the method that captures that area takes PDPage and Rectangle objects, now i want to iterate through every single page and locate first String(text) that located at the coordinates given. getPages() returns PDPageTree so I am a bit stuck since I cant figure out how to check every page , because now it iterates through every page.

public PDPageTree getPages() {
    return getPDDocument().getPages();
}

public String firstInvoiceNumber() throws IOException {
    Rectangle invoiceRectangle = new Rectangle(176, 176, 100, 18);
    String headerTextResult = "";
    for (PDPage pd : getPages()) {
        headerTextResult = StripByArea(pd, invoiceRectangle);
    }
    return headerTextResult;
}

Solution

  • Do it as follows:

    public String firstInvoiceNumber() throws IOException {
        Rectangle invoiceRectangle = new Rectangle(176, 176, 100, 18);
        String headerTextResult = "";
        for (PDPage pd : getPages()) {
            headerTextResult = StripByArea(pd, invoiceRectangle);
            if(!"".equals(headerTextResult)) {
                break;
            }
        }
        return headerTextResult;
    }