I have written the following small Java main
method. It takes in a (hardcoded for testing purposes!) PDF document I know contains active elements in the form and need to flatten it.
public static void main(String [] args) {
try {
// for testing
Tika tika = new Tika();
String filePath = "<path-to>/<pdf-document-with-active-elements>.pdf";
String fileName = filePath.substring(0, filePath.length() -4);
File file = new File(filePath);
if (tika.detect(file).equalsIgnoreCase("application/pdf")) {
PDDocument pdDocument = PDDocument.load(file);
PDAcroForm pdAcroForm = pdDocument.getDocumentCatalog().getAcroForm();
if (pdAcroForm != null) {
pdAcroForm.flatten();
pdAcroForm.refreshAppearances();
pdDocument.save(fileName + "-flattened.pdf");
}
pdDocument.close();
}
}
catch (Exception e) {
System.err.println("Exception: " + e.getLocalizedMessage());
}
}
What kind of test would assert the File(<path-to>/<pdf-document-with-active-elements>-flattened.pdf)
generated by this code would, in fact, be flat?
What kind of test would assert that the file generated by this code would, in fact, be flat?
Load that document anew and check whether it has any form fields in its PDAcroForm
(if there is a PDAcroForm
at all).
If you want to be thorough, also iterate through the pages and assure that there are no Widget annotations associated to them anymore.
And to really be thorough, additionally determine the field positions and contents before flattening and apply text extraction at those positions to the flattened pdf. This verifies that the form has not merely been dropped but indeed flattened.