Search code examples
javapdfboxjpeg2000

Image type UNKNOWN with PDFBox and JPEG 2000 sample


I've taken a sample JPEG 2000 from the fnord examples page.

However, when I try to add that image to the PDF:

PDDocument document = new PDDocument();
PDImageXObject pdImage = pdImage = PDImageXObject.createFromFileByContent(
   "samples/relax.jp2", document);
PDPage page = new PDPage(new PDRectangle(pageWidth, pageHeight));
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.drawImage(pdImage, matrix);
contentStream.close();

I get the exception:

Caused by: java.lang.IllegalArgumentException: Image type UNKNOWN not supported: relax.jp2 at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.createFromFileByContent(PDImageXObject.java:313)

The PDFBox dependencies that I have in Maven:

    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>2.0.12</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>fontbox</artifactId>
        <version>2.0.12</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>jempbox</artifactId>
        <version>1.8.16</version>
    </dependency>       
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>jbig2-imageio</artifactId>
        <version>3.0.2</version>
    </dependency>
    <dependency>
        <groupId>com.github.jai-imageio</groupId>
        <artifactId>jai-imageio-core</artifactId>
        <version>1.4.0</version>
    </dependency>
    <dependency>
        <groupId>com.github.jai-imageio</groupId>
        <artifactId>jai-imageio-jpeg2000</artifactId>
        <version>1.3.0</version>
    </dependency>

Am I doing something wrong here? Or there is some problem with PDFBox and/or the samples that I'm using?

Other Apache library, Tika, detects this sample file MIME type as image/jp2:

TikaConfig tika = new TikaConfig();
Metadata metadata = new Metadata();
MediaType mimetype = tika.getDetector().detect(
     TikaInputStream.get(new FileInputStream("samples/relax.jp2"), metadata);

Solution

  • From PDFBox's API documentation:

    createFromFileByContent()
    The following file types are supported: jpg, jpeg, tif, tiff, gif, bmp and png.

    Looking into the source code, what gets called inside createFromFileByContent() is their own check for known file types, independent from the underlying libraries, the detection code looks like this: FileTypeDetector.java.

    This check does not recognize JPEG 2000.

    Actually createFromFileByExtension() might be a better bet:

    if ("gif".equals(ext) || "bmp".equals(ext) || "png".equals(ext)) {
        BufferedImage bim = ImageIO.read(file);
        return LosslessFactory.createFromImage(doc, bim);
    }
    

    As long as you pretend you have a GIF, BMP or PNG and your ImageIO supports JPEG 2000, this might somewhat work (not tested).