Search code examples
pythonpdfmergecontains

Merge PDF if name contains X Python


I have two folders, folder 1 contains:

000000.pdf

123456.pdf

987654.pdf

Folder 2 contains:

000000 - Name1.pdf

123456 - Name2.pdf

111111 - Name8.pdf

I want to write a Python script to merge the pdfs if the number in the names of the files is the same, and skip the file if there is no match. And after merging it, it should be saved using the names in Folder 2. What would be the simples way to do this? The number of files might change and not be equal every time, so I was thinking something like

If contains number x:
     merge
Else:
     skip

But I have not got it working yet.


Solution

  • To merge files use PyPDF2. To get folder content you have different options but suggest to use glob. So the solution would look like this:

    from glob import glob
    from pathlib import Path
    
    f1_files = glob(".../*")
    f2_files = glob(".../*")
    
    for f1 in f1_files:
        number = Path(f1).stem # Path module is amazing!
        for f2 in f2_files: 
            if number in f2: # we found number in folder_2!
                merger = PdfFileMerger()
                merger.append(f1)
                merger.append(f2)
                merger.write(f2) # f2 here is the name of result file
                break