Search code examples
pythonpymupdf

Why is the MuPDF MediaBox of a page smaller than a contained image?


For this example PDF, I did this:

import fitz

doc = fitz.open("PDF-export-example-image-ocr.pdf")

print(f"(1) {doc[0].bound()=}")
print(f"(2) {doc[0].MediaBox=}")
print(f"(3) {doc[0].getImageList()}")


doc.close()

which gives:

(1) doc[0].bound()=Rect(0.0, 0.0, 612.0399780273438, 792.530029296875)

(2) doc[0].MediaBox=Rect(0.0, 0.0, 612.0399780273438, 792.530029296875)

(3) [(15, 0, 1275, 1651, 8, 'DeviceRGB', '', 'R12', 'DCTDecode')]

I expected (1) and (2) to be the same, although I don't understand why there are two ways to get the same.

What I don't understand is why the value of the image in (3) is so much bigger than the page on which it is. Can somebody explain that?


Solution

  • The image size you see is how many pixels are in the embedded JPEG image resource. That has literally zero effect on how big the image is going to be when drawn on the page. The physical size of the image on the page is entirely decided by the page content stream commands that draw the image.