Search code examples
pythonpypdf

Python PDF merging from an excel for loop


I have an excel sheet, with some dropdown lists. (Working) Now i'm in Python, trying to read the date fron the excel sheet (xlsx file) and reading the data into a for loop (Also working)

I have 3 column with a name, the name ref. to a pdf file, all pdf files are located the same place. I need to merge the 3 random PDF files into one.

So I can see i can use PyPDF2... But how can I do it in my for loop, so it will read the 3 values row by row and merge the files into one PDF, row by row?

My code is this ATM and i'm getting the right values from the xlsx sheet row by row.

import os
import pandas as pd
from PyPDF2 import PdfFileMerger

data = pd.read_excel(r'Resources\liste.xlsx', sheet_name='Ark1', skiprows=3)
dataread = pd.DataFrame(data)
for index, row in dataread.iterrows():
    print(index, row)

UPDATE!

As @JacoblRR point me to https://stackoverflow.com/questions/17104926/pypdf-merging-multiple-pdf-files-into-one-pdf/17304537#17304537 I can see how to get the files into to PyPDF2, my problem is that i'm getting 4 values from the excel sheet row by row. ex. Value1=u6AB, Value2=FUO0002, Value3=FUO0004, Value4=u34_driblinger

From that I then have a location c:\users\myuser\document\master\pdf\ in here i have u6ABx.pdf, FUO0002_xxxxxxx.pdf and FUO0004_xxxxxxx.pdf these 3 files I want to merge into u34_driblinger.pdf

How can I do that from the ex. from the link, like:

for index, row in dataread.iterrows():
    print(index, row)
    try:
    # if doc exist then merge
        if os.path.exists(row):
            input = PyPDF2.PdfFileReader(open(row, 'rb'))
            merger.append((input))
        else:
            print(f"problem with file {row}")

    except:
        print("cant merge !! sorry")
    else:
        print(f" {row} Merged !!! ")

merger.write("Merged_doc.pdf")

Solution

  • You cannot provide dataframe record which is of pd.Series type into os.path.exists, also since you excel contains filenames you have to provide full filepath, if your script is not located in same folder as PDF files.

    for index, row in dataread.iterrows():
        print(index, row)
        filepath =os.join('c:\users\myuser\document\master\pdf', row.iat[0])
        try:
        # if doc exist then merge
            if os.path.exists(filepath):
                input = PyPDF2.PdfFileReader(open(filepath, 'rb'))
                merger.append((input))
            else:
                print(f"problem with file {row}")
    
        except:
            print("cant merge !! sorry")
        else:
            print(f" {row} Merged !!! ")
    
    merger.write("Merged_doc.pdf")