Search code examples
pdfoptimizationghostscriptpoppler

PDF Optimisation: pdftops -passfonts - How did it make PDF loads way faster?


A few weeks ago, our users pointed out that some large OCRed PDF (ABBYY generated) loads extremely slowly and asked us to do some optimisation on it.

After some investigation, the problem seems to be caused by the complex text embedded within the PDF. I tried different scripts to optimise the PDFs, such as ghostscript, qpdf, etc...

The only one I found did make a significant improvement was to use pdftops (from poppler) with the -passfonts option and convert it back to PDF with ghostscript ps2pdf: pdftops -passfonts intput.pdf output.ps && ps2pdf output.ps output.pdf.

However, the problem is I have no idea how -passfonts can make PDF loads faster and whether it is making a side effect that I am not aware of...

So can PDF gurus shed some lights on the reason/logic behinds this optimisation?

Thank you all in advance!! Jeffrey


Solution

  • from http://linux.die.net/man/1/pdftops

    -passfonts

    By default, references to non-embedded 8-bit fonts in the PDF file are substituted with the closest "Helvetica", "Times-Roman", or "Courier" font. This option passes references to non-embedded fonts through to the PostScript file

    When the file opens, the reader will look on the system for the non-embedded fonts, and load them when it finds them. The more non-embedded fonts there are, the more checks it has to make. Sometime fonts are not embedded for legal reasons, sometimes they are not embedded because they make the file size go out of proportion, and various other reasons. By substituting the non-embedded fonts with a more common font, I'd say you are forcing the PDF to load a smaller number of fonts, and possibly forcing the PDF to use fonts that have a smaller memory foot print leading to a faster load time.

    Compare the fonts list before and after. Maybe that will shed more light. If you open the document in Adobe Acrobat: File -> Properties -> Fonts

    Be cautious with font substitution! It may completely ruin the look and feel of a document.