I have been using PDFBox to split pdf files into images for a while now, but after updating to 2.0.19 I have started running into unexpected exceptions.
This is the stack trace of the exception:
java.lang.ArrayIndexOutOfBoundsException: 3
at java.awt.color.ICC_ColorSpace.toRGB(ICC_ColorSpace.java:191)
at org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGB(PDICCBased.java:350)
at org.apache.pdfbox.rendering.PageDrawer.getPaint(PageDrawer.java:335)
at org.apache.pdfbox.rendering.PageDrawer.getNonStrokingPaint(PageDrawer.java:708)
at org.apache.pdfbox.rendering.PageDrawer.fillPath(PageDrawer.java:808)
at org.apache.pdfbox.contentstream.operator.graphics.FillEvenOddRule.process(FillEvenOddRule.java:37)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:875)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:509)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:483)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:269)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:321)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:203)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:190)
Here is the code that I have been using to split the file:
try (PDDocument document = PDDocument.load(new File("updated_test.pdf"))) {
PDPageTree pdPages = document.getDocumentCatalog().getPages();
PDFRenderer pdfRenderer = new PDFRenderer(document);
int page = 0;
for (PDPage pdPage : pdPages) {
String fileName = "demo" + page + ".png";
File tempImg = new File(fileName);
BufferedImage bim = pdfRenderer.renderImage(page);
ImageIOUtil.writeImage(bim, tempImg.getAbsolutePath(), 150);
page++;
}
} catch (Exception e) {
e.printStackTrace();
}
And here is the actual file that causes the issue: https://stackoverflowuploads.s3-us-west-2.amazonaws.com/updated_test.pdf
All help, ideas and advice would be greatly appreciated, if you have ideas about other solutions/libraries that can achieve the same results those would be very useful as well. Thank you!
This has been fixed in PDFBOX-4801 and a snapshot build is available here at the bottom.
It will be in 2.0.20, which is likely to be released in summer (hopefully).
The cause is an incorrect /N value (3) in the dictionary of a CMYK ICC profile. The correct value should have been 4. This results in the mentioned exception later. The corrected code checks the ICC profile and corrects the value of the PCICCBased object.