Search code examples
pythonpypdf

Where is the combining PDF file?


I have a problem and need your help. I learn Python with "Automate the boring stuff with Python". I am currently at chapter 13, dealing with PDF files and Word documents. I have these code from the book.It basically combine pdf file without their first page. But after I run the program, I didn't see any PDF file pop up. I try to find it in the directory but it is not there neither. So help me find that file, thank you! Here 's the code

import PyPDF2
import os
pdfFiles = []
for filename in os.listdir('.'):
if filename.endswith('.pdf'):
    pdfFiles.append(filename)
pdfFiles.sort(key=str.lower)
pdfWriter = PyPDF2.PdfFileWriter()
or filename in pdfFiles:
pdfFileObj = open(filename, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
for pageNum in range(1, pdfReader.numPages):
    pageObj = pdfReader.getPage(pageNum)
    pdfWriter.addPage(pageObj)
pdfOutput = open('allminutes.pdf', 'wb')
pdfWriter.write(pdfOutput)
pdfOutput.close()

Solution

  • There are some missing indentation, typo in the code. After fixing that, I can merge two PDF files as expected.

    Update

    As you are not able to get the output PDF file yet, let's check if it is really created by displaying the merged PDF file's number of pages.

    I am using an input folder for my input PDF files (which is input_files).

    merge_pdfs.py iterates over all PDF files in input_files and merged them to allminutes.pdf by skipping the first pages of each PDF files.

    Before running the code, folder structure:

    ├── input_files
    │   ├── module.pdf
    │   └── pypi.pdf
    ├── merge_pdfs.py
    ├── requirements.txt
    └── screenshots
        └── demo_output.png
    

    After running the code, folder structure:

    ├── allminutes.pdf
    ├── input_files
    │   ├── module.pdf
    │   └── pypi.pdf
    ├── merge_pdfs.py
    ├── requirements.txt
    └── screenshots
        └── demo_output.png
    

    merge_pdfs.py:

    import PyPDF2
    import os
    pdfFiles = []
    outputFile = 'allminutes.pdf'
    inputFileDirectory = 'input_files'
    for filename in os.listdir(inputFileDirectory):
        if filename.endswith('.pdf'):
            pdfFiles.append(inputFileDirectory+os.sep+filename)
        pdfFiles.sort(key=str.lower)
        pdfWriter = PyPDF2.PdfFileWriter()
        for filename in pdfFiles:
            pdfFileObj = open(filename, 'rb')
            pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
            for pageNum in range(1, pdfReader.numPages):
                pageObj = pdfReader.getPage(pageNum)
                pdfWriter.addPage(pageObj)
            pdfOutput = open(outputFile, 'wb')
            pdfWriter.write(pdfOutput)
            pdfOutput.close()
    print("Done merging the pdf files to {}".format(outputFile))
    
    pdfFile = PyPDF2.PdfFileReader(open(outputFile, "rb"))
    # print how many pages outputFile has:
    print("{} has {} pages.".format(outputFile, pdfFile.getNumPages()))
    

    Generated allminutes.pdf file:

    allminutes.pdf

    Output of merge_pdfs.py:

    Done merging the pdf files to allminutes.pdf
    allminutes.pdf has 4 pages.