Search code examples
pdfoptimizationpdf-generation

What is the smallest possible valid PDF?


Out of simple curiosity, having seen the smallest GIF, what is the smallest possible valid PDF file?


Solution

  • This is an interesting problem. Taking it by the book, you can start off with this:

    %PDF-1.0
    1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
    xref
    0 4
    0000000000 65535 f
    0000000010 00000 n
    0000000053 00000 n
    0000000102 00000 n
    trailer<</Size 4/Root 1 0 R>>
    startxref
    149
    %EOF
    

    which is 291 bytes of PDF joy. Acrobat opens it, but it complains somewhat. There is one page in it and it is 3/72" square, the minimum allowed by the spec.

    However, Acrobat X doesn't even bother with the cross reference table anymore, so we can take that out:

    %PDF-1.0
    1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
    trailer<</Size 4/Root 1 0 R>>
    

    Acrobat complains, but opens it. Now we're at 178 bytes. Turns out that you don't need that /Size in the trailer. Now we're at 172:

    %PDF-1.0
    1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
    trailer<</Root 1 0 R>>
    

    Turns out you don't need all those pesky /Type elements in your dictionaries:

    %PDF-1.0
    1 0 obj<</Pages 2 0 R>>endobj 2 0 obj<</Kids[3 0 R]/Count 1>>endobj 3 0 obj<</MediaBox[0 0 3 3]>>endobj
    trailer<</Root 1 0 R>>
    

    Now we're at 138 bytes.

    It also turns out that when the spec says "shall be an indirect reference" and /Count is required, and the header "must" be %PDF-1.0, some PDF readers don't enforce those rules. This is the smallest I could make it and have it openable in Acrobat X:

    %PDF-1.
    trailer<</Root<</Pages<</Kids[<</MediaBox[0 0 3 3]>>]>>>>>>
    

    70 bytes.

    Now, my editor uses Windows newline discipline, but Acrobat accepts Windows, Mac, or Unix conventions, so by using a hex editor, I replaced the \r\n with \r and removed the last newline altogether, which leaves me with 67 bytes

    25 50 44 46 2D 31 2E 0D 74 72 61 69 6C 65 72 3C 
    3C 2F 52 6F 6F 74 3C 3C 2F 50 61 67 65 73 3C 3C 
    2F 4B 69 64 73 5B 3C 3C 2F 4D 65 64 69 61 42 6F 
    78 5B 30 20 30 20 33 20 33 5D 3E 3E 5D 3E 3E 3E 
    3E 3E 3E 
    

    I tried taking off the last end dictionary (>>), but Acrobat wouldn't have that. The PDF reading built-in to Google Chrome (FoxIt) won't open it.

    As a PostScript (HA! See what I did there?), if you consent to Acrobat "repairing" the file, it bumps up to 3550 bytes, most of it optional metadata, but it leaves behind a number of clear spec violations.