Search code examples
pythonfilepython-imaging-librarypdf-generation

PDF Generation out of an images list takes too long - Python


I'm trying to generate a PDF using a list of 3 images, but it's being a bottleneck in my program - taking up to 30 seconds per PDF. I need to process a very big amount of images, so this time just wouldn't work. None of the solutions that I have tried so far have helped too much. The three images I'm testing with are 60 KB, 125 KB and 134 KB respectively.

I've tried using PIL, getting aroung 27 seconds per PDF. I used the following code:

def pil_pdf():  # 27 sec
    downloads = r"C:\Users\USER\Downloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        current_image = Image.open(os.path.join(downloads, f"{i}.png")).convert("RGB")
        imagelist.append(current_image)

    out_folder = os.path.join(r"C:\Users\USER\Downloads", f"out_vPIL.pdf")
    imagelist[0].save(out_folder, save_all=True, append_images=imagelist[1:])

... as well as with FPDF:

def new_pdf():  # 25 sec
    downloads = r"C:\Users\USER\Downloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        imagelist.append(os.path.join(downloads, f"{i}.png"))

    pdf = FPDF()
    for image in imagelist:
        pdf.add_page()
        pdf.image(image, 0, 0, 210, 297)

    pdf.output(os.path.join(r"C:\Users\USER\Downloads", f"out.pdf"))

I'd like to take the time down to about 10 seconds per PDF, but so far I haven't gotten any useful advice. Any advice would be extremely welcome.

Thanks so much for any suggestions or recommendations!


Solution

  • Let me try a bet: the best performance you should see is with PyMuPDF:

    import fitz  # import PyMuPDF
    
    imglist = [...]  # your list of image filenames
    doc = fitz.open()  # new empty PDF
    
    for ifile in imglist:
        idoc = fitz.open(ifile)
        pdfbytes = idoc.convert_to_pdf()
        doc.insert_pdf(fitz.open("pdf", pdfbytes))
    
    doc.save("myimages.pdf", garbage=3, deflate=True)