PDF Box: extract images from PDF document and keeping the image orientation

I found in this forum some pretty good solutions how to extract images from PDF documents by using PDFBox. I used the following code snipped, that I found in one post:

PDPageTree list = document.getPages();
    for (PDPage page : list) {
        PDResources pdResources = page.getResources();
        for (COSName c : pdResources.getXObjectNames()) {
            try {
                PDXObject imageObj = pdResources.getXObject(c);
                if (imageObj instanceof PDImageXObject) {
                    // same image to list
                    BufferedImage bImage = ((PDImageXObject) imageObj).getImage();
                    acceptedImages.add(bImage);
                }
            } catch (MissingImageReaderException mex) {
                log.warn("Missing Image Reader for format: ", mex);
            }
        }
    }

But I got the problem, that in rare cases, some extracted images have a wrong orientation. When I look at the PDF document, the pictures are displayed correctl. But some of the extracted images are rotated by n x 90° degrees. I guess the rotation information is stored somewhere in the PDF?

Solution

Run the PrintImageLocations.java example from the source code download (or here) and analyse the CTM ("current transformation matrix") to extract the rotation with Math.round(Math.toDegrees(Math.atan2(ctmNew.getShearY(), ctmNew.getScaleY()))).