Search code examples
c#pdfitext

How to extract rotated images from PDF with iText


I need to extract images from PDF. I know that some images are rotated 90 degrees (I checked with online tools).

I'm using this code:

PdfRenderListener:

public class PdfRenderListener : IExtRenderListener
{
    // other methods ...

    public void RenderImage(ImageRenderInfo renderInfo)
    {
        try
        {
            var mtx = renderInfo.GetImageCTM();
            var image = renderInfo.GetImage();
            var fillColor = renderInfo.GetCurrentFillColor();
            var color = Color.FromArgb(fillColor?.RGB ?? Color.Empty.ToArgb());
            var fileType = image.GetFileType();
            var extension = "." + fileType;
            var bytes = image.GetImageAsBytes();
            var height = mtx[Matrix.I22];
            var width = mtx[Matrix.I11];

            // rotated image
            if (height == 0 && width == 0)
            {
                var h = Math.Abs(mtx[Matrix.I12]);
                var w = Math.Abs(mtx[Matrix.I21]);
            }

            // save image
        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
    }
}

When I save images with this code the rotated images are saved with distortion.

I have read this post iText 7 ImageRenderInfo Matrix contains negative height on Even number Pages and mkl answer.

In current transfromation matrix (mtx) I have these values:

0 841.9 0
-595.1 0 0
595.1 0 1

I know image rotated 90 degrees. How can I transform an image to get a normal image?


Solution

  • As @mkl mentioned, the true reason was not in the rotation of the image, but with the applied filter.

    I analyzed the pdf file with iText RUPS and found that the image was encoded with a CCITTFaxDecode filter: RUPS screen

    Next, I looked for ways to decode this filter and found these questions

    1. Extracting image from PDF with /CCITTFaxDecode filter.
    2. How to use Bit Miracle LibTiff.Net to write the image to a MemoryStream

    I used the BitMiracle.LibTiff.NET library

    I wrote this method:

        private byte[] DecodeInternal(byte[] rawBytes, int width, int height, int k, int bitsPerComponent)
        {
            var compression = GetCompression(k);
    
            using var ms = new MemoryStream();
            var tms = new TiffStream();
    
            using var tiff = Tiff.ClientOpen("in-memory", "w", ms, tms);
            tiff.SetField(TiffTag.IMAGEWIDTH, width);
            tiff.SetField(TiffTag.IMAGELENGTH, height);
            tiff.SetField(TiffTag.COMPRESSION, compression);
            tiff.SetField(TiffTag.BITSPERSAMPLE, bitsPerComponent);
            tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
            var writeResult = tiff.WriteRawStrip(0, rawBytes, rawBytes.Length);
            if (writeResult == -1)
            {
               Console.WriteLine("Decoding error");
            }
    
            tiff.CheckpointDirectory();
            var decodedBytes = ms.ToArray();
            tiff.Close();
    
            return decodedBytes;
        }
    
            private Compression GetCompression(int k)
            {
                return k switch
                {
                    < 0 => Compression.CCITTFAX4,
                    0 => Compression.CCITTFAX3,
                    _ => throw new NotImplementedException("K > 0"),
                };
            }
    

    After decoding and rotating the image, I was able to save a normal image. Thanks everyone for the help.