Search code examples
pythonpypdf

PyPDF2 append file issue


I need to write script that will convert images into pdfs and merge tchem into one.

I have tried to use img2pdf and PYPDF2 but I'm getting errors. Could someone take a look and tell me what I'm doing wrong.

import img2pdf
import os
from PyPDF2 import PdfFileReader, PdfFileMerger, PdfFileWriter

merger = PdfFileMerger()
path = input()

for root,dir,files in os.walk(path):
        for eachfile in files:
            if "pdf" not in eachfile:
                os.chdir(root)
                PDFfile = img2pdf.convert((eachfile,), dpi=None, x=None, y=None)
                merger.append(fileobj=PDFfile)
merger.write(open("out.pdf", "wb"))

ERROR

Traceback (most recent call last):
  File "C:/Users/ms/Desktop/Desktop/test.py", line 13, in <module>
    merger.append(fileobj=PDFfile)
  File "C:\Python34\lib\site-packages\PyPDF2\merger.py", line 203, in append
    self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
  File "C:\Python34\lib\site-packages\PyPDF2\merger.py", line 133, in merge
    pdfr = PdfFileReader(fileobj, strict=self.strict)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1065, in __init__
    self.read(stream)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1660, in read
    stream.seek(-1, 2)
AttributeError: 'bytes' object has no attribute 'seek'

Solution

  • img2pdf.convert returns bytes of the corresponding pdf file (as a string?), and not a file handler. If you read help(merger.append) you will see that you need to pass either a file handler or a path to the PDF file. Here is a possible solution. It's probably also possible not to generate all the intermediate PDF files.

    import img2pdf
    import os
    from PyPDF2 import PdfFileReader, PdfFileMerger, PdfFileWriter
    merger = PdfFileMerger()
    path = "/tmp/images"
    
    for root,dir,files in os.walk(path):
            for eachfile in files:
                if "pdf" not in eachfile:
                    os.chdir(root)
                    pdfbytes = img2pdf.convert((eachfile,), dpi=None, x=None, y=None)
                    pdfname = eachfile.split('.')[0]+".pdf"
                    f = open(pdfname, 'wb+')
                    f.write(pdfbytes)
                    merger.append(fileobj=f)
                    f.close()
    
    f = open("out.pdf", "wb")
    merger.write(f)
    f.close()
    

    By the way, it would also be much simpler to use regular tools like convert, pdfjam, pdftk.