Search code examples
javaimageextractorientationpdfbox

PDF Box: extract images from PDF document and keeping the image orientation


I found in this forum some pretty good solutions how to extract images from PDF documents by using PDFBox. I used the following code snipped, that I found in one post:

PDPageTree list = document.getPages();
    for (PDPage page : list) {
        PDResources pdResources = page.getResources();
        for (COSName c : pdResources.getXObjectNames()) {
            try {
                PDXObject imageObj = pdResources.getXObject(c);
                if (imageObj instanceof PDImageXObject) {
                    // same image to list
                    BufferedImage bImage = ((PDImageXObject) imageObj).getImage();
                    acceptedImages.add(bImage);
                }
            } catch (MissingImageReaderException mex) {
                log.warn("Missing Image Reader for format: ", mex);
            }
        }
    }

But I got the problem, that in rare cases, some extracted images have a wrong orientation. When I look at the PDF document, the pictures are displayed correctl. But some of the extracted images are rotated by n x 90° degrees. I guess the rotation information is stored somewhere in the PDF?


Solution

  • Run the PrintImageLocations.java example from the source code download (or here) and analyse the CTM ("current transformation matrix") to extract the rotation with Math.round(Math.toDegrees(Math.atan2(ctmNew.getShearY(), ctmNew.getScaleY()))).