Search code examples

Loop through folder and subfolders and merge pdf

I tried to create a script to loop through parent folder and subfolders and merge all of the pdfs into one. Below if the code I wrote so far, but I don't know how to combine them into one script.

Reference: Merge PDF files

The first function is to loop through all of the subfolders under parent folder and get a list of path for each pdf.

import os
from PyPDF2 import PdfFileMerger

root = r"folder path"
path = os.path.join(root, "folder path")

def list_dir():
    for path,subdirs,files in os.walk(root):
        for name in files:
            if name.endswith(".pdf") or name.endswith(".ipynb"):
                print (os.path.join(path,name))


Second, I created a list to append all of the path to pdf files in the subfolders and merge into one combined file. At this step, I was told:

TypeError: listdir: path should be string, bytes, os.PathLike or None, not list

root_folder = []
def pdf_merge():
    merger = PdfFileMerger()    
    allpdfs = [a for a in os.listdir(root_folder)]

    for pdf in allpdfs:
    with open("Combined.pdf","wb") as new_file:


Where and what should I modify the code in order to avoid the error and also combine two functions together?


  • First you have to create function which creates list with all files and return it.

    def list_dir(root):
        result = []
        for path, dirs, files in os.walk(root):
            for name in files:
                if name.lower().endswith( (".pdf", ".ipynb") ):
                    result.append(os.path.join(path, name))
        return result

    I use also .lower() to catch extensions like .PDF.

    endswith() can use tuple with all extensions.

    It is good to get external values as arguments - list_dir(root) instead of list_dir()

    And later you can use as

    allpdfs = list_dir("folder path")


    def pdf_merge(root):
        merger = PdfFileMerger()    
        allpdfs = list_dir(root)
        for pdf in allpdfs:
            merger.append(open(pdf, 'rb'))
        with open("Combined.pdf", 'wb') as new_file:
    pdf_merge("folder path")


    First function could be even more universal if it would get also extensions

    import os
    def list_dir(root, exts=None):
        result = []
        for path, dirs, files in os.walk(root):
            for name in files:
                if exts and not name.lower().endswith(exts):
                result.append(os.path.join(path, name))
        return result
    all_files  = list_dir('folder_path')
    all_pdfs   = list_dir('folder_path', '.pdf')
    all_images = list_dir('folder_path', ('.png', '.jpg', '.gif'))


    For single extension you can also do

    import glob
    all_pdfs = glob.glob('folder_path/**/*.pdf', recursive=True)

    It needs ** with recursive=True to search in subfolders.