I am setting up a java project where I use pdfBox to get images out of PDF. Since I am using tika-app for my other functions, I decided to go with pdfBox present inside tika-app-1.20.jar.
I have tried including the jai-imageio-core-1.3.1.jar before,since Tika-app already comes bundled with this jar. I tried with tika-app jar alone.
The line that's throwing error
PDXObject object = resources.getXObject(cosName);
the log trace of the error:
org.apache.pdfbox.filter.MissingImageReaderException: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:163)
at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:115)
at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:64)
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:77)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163)
at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:236)
at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.<init>(PDImageXObject.java:140)
at org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
at org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:426)
But I am quite sure I have jai-imageio-core in tika which turns out to be invisible when I run the code.
Actually, I stumbled upon this error as well but this is mentionned in the PDFBox documentation here. You need to add the following dependencies to your pom.xml
:
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-core</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-jpeg2000</artifactId>
<version>1.3.0</version>
</dependency>
<!-- Optional for you ; just to avoid the same error with JBIG2 images -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>jbig2-imageio</artifactId>
<version>3.0.3</version>
</dependency>
If you are using Gradle :
dependencies {
implementation 'com.github.jai-imageio:jai-imageio-core:1.4.0'
implementation 'com.github.jai-imageio:jai-imageio-jpeg2000:1.3.0'
// Optional for you ; just to avoid the same error with JBIG2 images
implementation 'org.apache.pdfbox:jbig2-imageio:3.0.3'
}