Search code examples
pdfpdf-generationpypdf

Are PDF box coordinates relative or absolute?


I want to programmatically edit a PDF using pyPDF. Currently, I'm struggling with interpreting the various PDF boxes' (TrimBox, MediaBox etc.) dimensions. Each box has four dimensions stored as a four-tuple, e.g.:

TrimBox:           56.69    56.69  1040.31   751.18

According to the PDF specification, these are supposed to describe a rectangle, and certainly (56.69, 56.69) determines the upper left corner of this rectangle. However, is (1040.31, 751.18) to be interpreted as the lower right corner of this rectangle, or as a vector relative to the upper left corner?

Apparently the answer is so well-known among typesetters that I couldn't find it explicitly spelt out anywhere I looked so far.


Solution

  • As Mark Storer and others commented correctly, the four box values are to be interpreted as (left start, bottom start, right end, top end), since the PDF format uses absolute coordinates. So (MediaBox[0], MediaBox[1]) is the bottom left corner and (MediaBox[2] and MediaBox[3]) the upper right corner of the box. MediaBox[2] and MediaBox[3] only represent width and height if MediaBox[0] and MediaBox[1] contain value 0, which should not be relied on.

    Moreover, PDF rotation modifies the whole coordinate system rather than just the page, thus PDF boxes always refer to the non-rotated page. So if there is a rotation of 90 or 270 degrees, you need to swap width and height in order to obtain the visual dimensions of the box.

    The coordinate values are called points, where by default 1 point is equivalent to 1/72 of an inch. However, this should not be relied on either, because each page can define a custom UserUnit (since PDF 1.6), as outlined in the PDF Reference Manual.