I have created two directories with input and output names. Input directory have more than one PDF file and each file has multiple pages. I am trying to get first page of every PDF file and that should be save on output directory. Below is the code i am tryingimport os
from PyPDF2 import PdfFileWriter, PdfFileReader
in_path = "D:/data/input/"
out_path = "D:/data/output/"
output = PdfFileWriter()
pages_to_keep = [0]
in_files = (f for f in os.listdir(in_path) if os.path.isfile(f) and f.endswith('.pdf'))
for file in in_files:
po = open(file, 'rb')
rd = PdfFileReader(po, strict=False)
for i in pages_to_keep:
page = rd.getPage(i)
output.addPage(page)
with open(out_path+str(file), 'wb') as f:
output.write(f):
The problem is: when i executing the script that is saving output file1 having 1 page, and output file2 having 2 pages, third file having three pages. But i need only first page from all PDF files. How to solve this.
You need to reset output for each file:
for file in in_files:
output = PdfFileWriter() # clear output
po = open(file, 'rb')
rd = PdfFileReader(po, strict=False)
for i in pages_to_keep:
page = rd.getPage(i)
output.addPage(page)
with open(out_path+str(file), 'wb') as f:
output.write(f):