I have a problem and need your help. I learn Python with "Automate the boring stuff with Python". I am currently at chapter 13, dealing with PDF files and Word documents. I have these code from the book.It basically combine pdf file without their first page. But after I run the program, I didn't see any PDF file pop up. I try to find it in the directory but it is not there neither. So help me find that file, thank you! Here 's the code
import PyPDF2
import os
pdfFiles = []
for filename in os.listdir('.'):
if filename.endswith('.pdf'):
pdfFiles.append(filename)
pdfFiles.sort(key=str.lower)
pdfWriter = PyPDF2.PdfFileWriter()
or filename in pdfFiles:
pdfFileObj = open(filename, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
for pageNum in range(1, pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
pdfOutput = open('allminutes.pdf', 'wb')
pdfWriter.write(pdfOutput)
pdfOutput.close()
There are some missing indentation, typo in the code. After fixing that, I can merge two PDF files as expected.
Update
As you are not able to get the output PDF file yet, let's check if it is really created by displaying the merged PDF file's number of pages.
I am using an input folder for my input PDF files (which is input_files
).
merge_pdfs.py
iterates over all PDF files in input_files
and merged them to allminutes.pdf
by skipping the first pages of each PDF files.
Before running the code, folder structure:
├── input_files
│ ├── module.pdf
│ └── pypi.pdf
├── merge_pdfs.py
├── requirements.txt
└── screenshots
└── demo_output.png
After running the code, folder structure:
├── allminutes.pdf
├── input_files
│ ├── module.pdf
│ └── pypi.pdf
├── merge_pdfs.py
├── requirements.txt
└── screenshots
└── demo_output.png
merge_pdfs.py
:
import PyPDF2
import os
pdfFiles = []
outputFile = 'allminutes.pdf'
inputFileDirectory = 'input_files'
for filename in os.listdir(inputFileDirectory):
if filename.endswith('.pdf'):
pdfFiles.append(inputFileDirectory+os.sep+filename)
pdfFiles.sort(key=str.lower)
pdfWriter = PyPDF2.PdfFileWriter()
for filename in pdfFiles:
pdfFileObj = open(filename, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
for pageNum in range(1, pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
pdfOutput = open(outputFile, 'wb')
pdfWriter.write(pdfOutput)
pdfOutput.close()
print("Done merging the pdf files to {}".format(outputFile))
pdfFile = PyPDF2.PdfFileReader(open(outputFile, "rb"))
# print how many pages outputFile has:
print("{} has {} pages.".format(outputFile, pdfFile.getNumPages()))
Generated allminutes.pdf
file:
Output of merge_pdfs.py
:
Done merging the pdf files to allminutes.pdf
allminutes.pdf has 4 pages.