Search code examples
c#itextjpegclipping

Converting images to PDF with iTextSharp preserve clipping path


We're looking to convert images in bulk to PDF, programmatically. So far it looks like we will be using iTextSharp but we have an issue with JPG images with clipping path. We are using the following code in our tests:

using (FileStream fs = new FileStream(output, FileMode.Create, FileAccess.Write, FileShare.None))
{
    using (Document doc = new Document())
    {
        using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
        {
            doc.Open();
            iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(source);

            image.SetAbsolutePosition(0, 0);
            doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, image.Width, image.Height, 0));
            doc.NewPage();

            writer.DirectContent.AddImage(image,false); 

            doc.Close();
        }
    }
}

Clipping path in JPG images seems to just be discarded. Is there a way to preserve the clipping path? Also when calling AddImage there is an option for InlineImage, anyone knows what this does?


Solution

  • iText copies the bytes of a JPG straight into the PDF. Not a single byte is changed. If you say that your JPGs have clipping paths (I've never heard of such a thing) and you don't see that feature in the PDF, you are being confronted with a limitation inherent to PDF, not to iText. iText doesn't even look at the JPG bytes: it just creates a PDF stream object with the filter DCTDecode.

    You will have to apply the clipping path before adding the image to the PDF. As you may know, PDF doesn't support PNGs and PNG supports transparency. When iText encounters a transparent PNG, it processes the PNG. It creates two images: one opaque image using /FlateDecode and one monochrome image using /FlateDecode. The opaque image is added with the monochrome image as its mask to obtain transparency. I guess you'll have to preprocess your JPG in a similar way.

    About inline images:

    Don't use inline images: using inline images means that the images are stored in the content stream of the PDF as opposed to being stored as an Image XObject (which is the optimal way of storing images in a PDF). Inline images can only be used for images with a size of 4 KB or less. Larger inline images will be forbidden in PDF 2.0.

    Extra remark:

    I think I see a problem in your code. You are creating a document with page size A4:

    Document doc = new Document()
    

    A4 is the default size when you don't pass a parameter to the Document constructor. Afterwards, you try changing the page size like this:

    doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, image.Width, image.Height, 0));
    doc.NewPage();
    

    However: as you didn't add any content to the first page yet, the NewPage() method will be ignored and the page size will not be changed. You will still be on page 1 with size A4.

    iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(source);
    using (FileStream fs = new FileStream(output, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        using (Document doc = new Document(image))
        {
            using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
            {
                doc.Open();
                image.SetAbsolutePosition(0, 0);
                writer.DirectContent.AddImage(image); 
                doc.Close();
             }
         }
    }