Search code examples
pdfpypdf

Using pypdf to fix orientation problems within pdf document


I am trying to put a drop-down annotation into a pdf page at coordinates (x,y).

The pdf I wish to add these annotations to is a pdf that comes from a scanned document. For some reason the scanner I am using produces a pdf page with the following content:

{'/Type': '/Page', '/Parent': IndirectObject(1, 0, 2500474218768), '/Rotate': 270, '/Resources': IndirectObject(6, 0, 2500474218768), '/MediaBox': [0.0, 0.0, 792, 612], '/CropBox': [0.0, 0.0, 792, 612], '/Contents': [IndirectObject(7, 0, 2500474218768)]}

This page has

/Rotate = 270 
/MediaBox = [0.0, 0.0, 792, 612]

So the printed pdf is in portrait, despite the landscape mediabox.

To place the annotation at (x,y), I need the coordinate systems to agree.

Instead of changing (x,y) to landscape (not sure how well that would work), I would like to change the orientation of the pdf that was scanned to some standard orientation that would allow me to place the annotation correctly at (x,y).

How can I use pypdf to do this?

I have tried setting page.rotate(-270) and page.mediabox.upper_right(page.mediabox.top,page.mediabox.right)

However the pdf no longer prints correctly after those transformations. I do not understand things well enough to know how to do this correctly.


Solution

  • Thank you very much for your comments on cpdf. Ultimately I did not use cpdf and found a solution using pypdf.

    I would like to share this solution in case anyone might find it useful.

    1. Most likely you are advanced enough at life to realize the direction documents go into your scanner. However, if you're like me, you might be putting your documents into your scanner in landscape position for which the pdf that is generated has the undesired qualities above. Just rotate your documents by hand 90 degrees before you place them into your scanner, no code required.

    2. In the case you want to correct the undesired rotation in your pdf and those rotations are exactly like those I described above, use the following pypdf command.

    import pypdf
    ...
    reader = pypdf.PdfReader(pdf_file)
    writer = pypdf.PdfWriter()
    for page in reader.pages:
        page.transfer_rotation_to_content()
        writer.add_page(page)
    ...
    
    1. If that doesn't work, and you're mouth breathing as loudly as I was trying to figure out, well, what a pdf is this was the only reference I found to help me understand how the 3x3 transformation matrix is used to do the transformations you want. It makes sense once you make sense of it, but not a moment before.

    Note: Answer provided by OP on question section.