Search code examples
reportlabpdf-readerpdfrw

ReportLab and pdfrw: Importing Scanned PDF


Using the code below, I am trying to import a pdf page into an existing canvas object and save to PDF. This usually works just fine, but I noticed that when I try it with a PDF generated from a scanned document, it results in a blank page. Any takers?

from reportlab.pdfgen import canvas
from pdfrw import PdfReader
from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl

c = canvas.Canvas(Out_Folder+pdf_file_name)
c.setPageSize([11*inch, 8.5*inch])

page = PdfReader(folder+'2_VisionMissionValues.pdf',decompress=False).pages
p = pagexobj(page[0])
c.setPageSize([11*inch, 8.5*inch]) #Set page size (for landscape)
c.doForm(makerl(c, p))
c.showPage()
c.save()

Thanks in advance!


Solution

  • Sooo...

    On the one hand, I have absolutely no idea why this is happening, and not really much time to debug it right now.

    On the other hand, I have a workaround for you (and I tried the workaround on v0.3, as well as on the current github master, and it worked in both cases for me).

    I started off by verifying that your code failed on your page and that it worked on another PDF. Then I asked myself "What happens if I use my watermark example to create a PDF with your page as a watermark?" (because that uses some of the same form XObject code). That worked, so then I asked myself "What does it look like if I pass my watermarked page through your reportlab code?"

    Interestingly, the entire watermarked page, including your image made it through. So I modified your code to do the minimal stuff that the watermark does, which winds up putting a form XObject inside a form XObject when it's passed to reportlab. That worked.

    Here's a slightly modified version of your code that I used for this.

    import sys
    
    from reportlab.pdfgen import canvas
    from pdfrw import PdfReader, PageMerge
    from pdfrw.buildxobj import pagexobj
    from pdfrw.toreportlab import makerl
    
    inch = 72
    
    fname, = sys.argv[1:]
    page = PdfReader(fname,decompress=False).pages[0]
    p = pagexobj(PageMerge().add(page).render())
    
    c = canvas.Canvas('outstuff.pdf')
    c.setPageSize([8.5*inch, 11.0*inch]) #Set page size (for portrait)
    c.doForm(makerl(c, p))
    c.showPage()
    c.save()