Search code examples
pdffilesize

Why is PDF file size so small?


I have a few copies of textbooks this semester on PDF. These are 1000 page computer science textbooks full of graphics. When I downloaded it, it took just a few seconds which was amazing, I thought something had gone wrong. The entire textbook was 9.7 MB. I opened it up and sure enough, the entire textbook was there, all images and everything were loaded instantly (and I have a really terrible internet connection)

I am just wondering what amazing compression technique allows you to store 1000 pages of a textbook in under 10 MB?

Here is a screenshot of the file properties, I am so baffled. enter image description here


Solution

  • A typical text page is between 3k and 6k tokens. So the text of your 1000 page book would fit in 6MB even without compression.

    Normal compression tools can reduce plain ASCII text with something like 60-80%.

    So lets say it's 75%, then you need 0.25 x 6MB = 1.5MB for the text. That leaves 8.5 MB for the pictures.

    For vector based images like svg that's a lot, they are small and compress as well as text. But 8.5 MB does not leave room for a lot of embedded bitmaps.