Search code examples
pdfadobepdf-generation

Adobe Acrobat compressing internal objects?


I have a simple one page PDF document.

Using Adobe Acrobat X (10.1.4), I added 2 graphical annotations (Ink). So far so good.

Now I opened the document in Notepad++ to inspect it. Everything seemed fine. There was the annotations array, and both annotations. All good.

Then I randomly entered one space char " " in the xref table to make the document "invalid". When I opened it in Adobe Acrobat X (Version 10.1.4), it was capable of displaying everything like it was (apparently after automatically repairing the document) and asked me then, wether I would like to save the new version to disk. I did.

Now I opened the document in Notepad++ again, just to find, that it looks completely different, than it looked like before I did the modifications.

The most weird thing is, that most of the objects just vanished from the document! There were still references to them, but the actual objects are not there. In addition there were a bunch of flate-decoded stuff.

Is it possible, that the Adobe Acrobat reader not only compresses streams, but also whole objects including there "x y obj" and "endobj" tags?


Solution

  • As of PDF 1.5 object streams have been introduced to the PDF format, cf. section 7.5.7 of the current PDF specification ISO 32000-1:2008:

    An object stream, is a stream object in which a sequence of indirect objects may be stored, as an alternative to their being stored at the outermost file level.

    NOTE 1 Object streams are first introduced in PDF 1.5. The purpose of object streams is to allow indirect objects other than streams to be stored more compactly by using the facilities provided by stream compression filters.

    By allowing Adobe Acrobat to save the repaired version of your document, you implicitly allowed it to do that in its perferred format which due to compactness uses object streams