Currently using pdfbox 2.x library for removing and adding the QR code image after loading the pdf file from the file system. After removing the QR code from the pdf file and saving and opening the modified document in Adobe Reader, it populates the above warning message "An error exists on this page. Acrobat may not display the page correctly". The QR code image is removed successfully but shows the warning message after opening.
Also, Before removing the QR code image from the pdf file, the file size was 6.8 MB. After reading the QR code, the file size increases to 8.1 MB.
It should not show the warning message: "An error exists on this page. Acrobat may not display the page correctly" when opening the modified document without QR code image. For the original file, there is no warning message showing.
Also, it was expected that after removing the QR code image, the file size should not increase, rather it should decrease or remain the same.
Can you please help?
Below is the code for removing qr code image from the pdf file.
pdDocument = PDDocument.load(new File(aBarcodeVO.getSourceFilePath()));
newDocument = new PDDocument();
for (int pageCount = 0; pageCount < pdDocument.getNumberOfPages(); pageCount++) {
PDPage pdPage = newDocument.importPage(pdDocument.getPage(pageCount));
String imgUniqueId = aBarcodeVO.getImgUniqueId().concat(String.valueOf(pageCount));
boolean hasQRCodeOnPage = removeQRCodeImage(newDocument, pdPage, imgUniqueId);
qRCodePageList.add(hasQRCodeOnPage);
}
if(qRCodePageList.contains(true)) {
newDocument.save(aBarcodeVO.getDestinationFilePath(true));
}
newDocument.close();
pdDocument.close();
public static boolean removeQRCodeImage(PDDocument document, PDPage page, String imgUniqueId) throws Exception {
String qrCodeCosName = null;
PDResources pdResources = page.getResources();
boolean hasQRCodeOnPage=false;
for (COSName propertyName : pdResources.getXObjectNames()) {
if (!pdResources.isImageXObject(propertyName)) {
continue;
}
PDXObject o;
try {
o = pdResources.getXObject(propertyName);
if (o instanceof PDImageXObject) {
PDImageXObject pdImageXObject = (PDImageXObject) o;
if (pdImageXObject.getMetadata() != null) {
DomXmpParser xmpParser = new DomXmpParser();
XMPMetadata xmpMetadata = xmpParser.parse(pdImageXObject.getMetadata().toByteArray());
if(xmpMetadata.getDublinCoreSchema()!=null && StringUtils.isNoneBlank(xmpMetadata.getDublinCoreSchema().getTitle())&&xmpMetadata.getDublinCoreSchema().getTitle().contains("_barcodeimg_")) {
((COSDictionary) pdResources.getCOSObject().getDictionaryObject(COSName.XOBJECT))
.removeItem(propertyName);
log.debug("propertyName REMOVED--"+propertyName.getName());
qrCodeCosName = propertyName.getName();
hasQRCodeOnPage=true;
}
}
}
} catch (IOException e) {
log.error("Exception in removeQRCodeImage() while extracting QR image:" + e, e);
}
}
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List<Object> tokens = parser.getTokens();
log.debug("original tokens size" + tokens.size());
List<Object> newTokens = new ArrayList<Object>();
for (int j = 0; j < tokens.size(); j++) {
Object token = tokens.get(j);
if (token instanceof Operator) {
Operator op = (Operator) token;
// find image - remove it
if (op.getName().equals("Do")) {
COSName cosName = (COSName) tokens.get(j - 1);
if (cosName.getName().equals(qrCodeCosName)) {
newTokens.remove(newTokens.size() - 1);
continue;
}
}
}
newTokens.add(token);
}
log.debug("tokens size" + newTokens.size());
PDStream newContents = new PDStream(document);
OutputStream out = newContents.createOutputStream();
ContentStreamWriter writer = new ContentStreamWriter(out);
writer.writeTokens(newTokens);
out.close();
page.setContents(newContents);
return hasQRCodeOnPage;
}
A possible error: PDF resources can be shared across pages, even the same Resources object may be used for multiple pages. If your document is of such a type, therefore, your manipulation of the resources of a page may actually manipulate the resources of all pages while your content stream manipulation changes only a single page. Uses of the same image on other pages, therefore, could cause the error message you observed.
Another possible error: While iterating over the resources of the page, you remove all matching image Xobjects. But while iterating over the instructions of the page, you only remove the showing instructions for one matching image Xobject, the last one found. If there are multiple matching image Xobjects on a page, showing instructions for some of them may remain while the Xobjects themselves are removed; this could also cause the error message observed.
There might also be other issues. For a more specific analysis please share a representative example PDF.