Reusing graphical assets in PDF files

As part of a project that includes a browser-based visual editor, I use fabric.js to create SVG files which are then converted to PDF files.

In some cases the end result is a generated PDF file which has the same image appearing multiple times throughout the document which casues a substantial and unnecessary file bloat, resulting in very large PDF files.

On SVGs this can easily be fixed by using a single <image> element and reusing it with <use> elements (as answered in my previous question). Using Inkscape to convert the SVG to PDF format and Inkscape doesn't seem to 'get the hint', as it re-embeds the repeated image for every single appearance in the document.

The PDF compressor tool at Smallpdf.com seems to be able to fix this issue, but I can't understand how exactly it does this; nor can I replicate this optimisation with Inkscape or any other tool that I know of.

Is there a name for this technique, or better yet a way for me to replicate this on my own? I read that the XObjects in PDFs are the appropriate tool for this, but I don't understand how to implement them on my own nor can I find any real examples.

Solution

In general PDF indeed provides the same capabilities as SVG in a very similar way through the use of XObjects.

An XObject can be used to define an image or a group of graphics operators that would otherwise be part of the page content. It gets a name and its own content stream, and has its own resources to enable it to be a standalone piece of content. This XObject can then be included in the page content using the "Do" operator, which is very similar to what you describe with "use" in SVG.

In theory, an XObject can appear once in a PDF file and then be used multiple times throughout the document without significantly increasing the file size of the PDF file. Whether this happens or not depends on the PDF creation library or the optimisation capabilities of a PDF library.

The Adobe PDF library for example is capable of optimising PDF files so that repeated content that occurs in XObjects is optimised away - a single copy of the XObject then remains and each use of that XObject in a page description refers to that single object. I've seen examples where the file size is reduced from multiple gigabytes of data to less than a megabyte in variable data scenarios.

In order to use this, you need:

a PDF file where the repeating content is in fact contained in an XObject
a PDF generator or processor who can correctly create or is smart enough to optimise the PDF file to take advantage of this