Search code examples
javaunit-testingpdfbox

apache pdfbox - how to test if a document is flattened?


I have written the following small Java main method. It takes in a (hardcoded for testing purposes!) PDF document I know contains active elements in the form and need to flatten it.

public static void main(String [] args) {

    try {
        // for testing
        Tika tika = new Tika();
        String filePath = "<path-to>/<pdf-document-with-active-elements>.pdf";
        String fileName = filePath.substring(0, filePath.length() -4);
        File file = new File(filePath);
        if (tika.detect(file).equalsIgnoreCase("application/pdf")) {
            PDDocument pdDocument = PDDocument.load(file);
            PDAcroForm pdAcroForm = pdDocument.getDocumentCatalog().getAcroForm();
            if (pdAcroForm != null) {
                pdAcroForm.flatten();
                pdAcroForm.refreshAppearances();

                pdDocument.save(fileName + "-flattened.pdf");
            }
            pdDocument.close();
        }
    }
    catch (Exception e) {
        System.err.println("Exception: " + e.getLocalizedMessage());
    }
}

What kind of test would assert the File(<path-to>/<pdf-document-with-active-elements>-flattened.pdf) generated by this code would, in fact, be flat?


Solution

  • What kind of test would assert that the file generated by this code would, in fact, be flat?

    Load that document anew and check whether it has any form fields in its PDAcroForm (if there is a PDAcroForm at all).

    If you want to be thorough, also iterate through the pages and assure that there are no Widget annotations associated to them anymore.

    And to really be thorough, additionally determine the field positions and contents before flattening and apply text extraction at those positions to the flattened pdf. This verifies that the form has not merely been dropped but indeed flattened.