Search code examples
pythonpython-3.xpypdf

Combine PDF's with specific names from two different folders using PyPDF2


I have two folders with a different set of pdfs. I know that the PDF with a specific name from the first folder needs to be combined with a PDF with a specific name from the second folder. For example, "PID-01.pdf" from the first folder needs to be combined with "FNN-PID-01.pdf" from the second folder, "PID-02.pdf" from the first folder needs to be combined with "FNN-PID-02.pdf" from the second folder, I have two folders with so on and so forth. I am using a python module PyPDF2. Could anyone give an example using PyPDF2


Solution

  • Did you mean "merged" as by saying "combined"?

    if so,

    lets say folder1 contains "PID-01.pdf" and folder2 contains "FNN-PID-01.pdf".

    import os
    from PyPDF2 import PdfFileMerger, PdfFileReader
    folder1 = "/your/path/to/folder1/"
    folder2 = "/your/path/to/folder2/"
    merged_folder = "/your/path/to/merged/folder/"
    
    f1_files = os.listdir(folder1) # ['PID-01.pdf','PID-02.pdf'...etc]
    f2_files = os.listdir(folder2) # ['FNN-PID-01.pdf','FNN-PID-02.pdf'...etc]
    
    def pdf_merger(f1,f2):
        merger = PdfFileMerger()
        f1_content = PdfFileReader(file(os.path.join(folder1,f1), 'rb'))
        f2_content = PdfFileReader(file(os.path.join(folder2,f2), 'rb'))
        merger.append(f1_content)
        merger.append(f2_content)
        out = os.path.join(merged_folder,f"merged-{f1}")
        merger.write(out)
    
    #below code will iterate each file in folder1 and checks if those               
    #folder2 filename string "FNN-PID-01.pdf" contains substring "PID-01.pdf"
    #if matchs, the 2 matching files are merged and saved to merged_folder
    
    for file1 in f1_files : 
        for file2 in f2_files: 
            if file1 in file2: 
                pdf_merger(file1,file2)
    

    You can just iterate files and write your own matching pattern using regex for advanced usage.