Search code examples
phppdfcompressionarchive

PDF file compression


I have a requirement to dynamically generate and compress large batches of PDF files.

I am considering the usual algorithms

  • Zip
  • Ace
  • Rar

Any other suggestion are welcome.

My question is which algorithm is likely to give me the smallest file size. Speed and efficency are also important factors but size is my primary concern.

Also does it make a difference whether I have many small files, or fewer larger files in each archive.

Most of my processing will be done in PHP, but I'm happy to interface with third party executables if needed.

Edit:

The documents are primarily invoices and shouldn't contain any other images except for the company logo


Solution

  • I have not had much success compressing PDFs. As pointed out, they are already compressed when composed (although some PDF composition tools allow you to specify a 'compression level'). If at all possible, the first approach you should take is to reduce the size of the composed PDFs.

    If you keep the PDFs in a single file, they can share any common resources (images, fonts) and so can be significantly smaller. Note that this means one large PDF file, not one large ZIP with multiple PDFs inside.

    In my experience it is quite difficult to compress the images within PDFs, and that images make by far the biggest impact on file size. Ensure that you have optimised images before you start. It is even worth running a test run without your images simply to see how much size the images are contributing.

    The other component is font, and if you are using multiple embedded fonts then you are packing more data into the file. Just use one font to keep size down, or use fonts that are commonly installed so that you don't need to embed them.