Search code examples
pythonpypdf

How to split only first page in each pdf file from directory that has muliple files?


I have created two directories with input and output names. Input directory have more than one PDF file and each file has multiple pages. I am trying to get first page of every PDF file and that should be save on output directory. Below is the code i am tryingimport os

from PyPDF2 import PdfFileWriter, PdfFileReader

in_path = "D:/data/input/"
out_path = "D:/data/output/"

output = PdfFileWriter()
pages_to_keep = [0]

in_files = (f for f in os.listdir(in_path) if os.path.isfile(f) and f.endswith('.pdf'))



for file in in_files:
    po = open(file, 'rb')
    rd = PdfFileReader(po, strict=False)
    for i in pages_to_keep:
        page = rd.getPage(i)
        output.addPage(page)
    with open(out_path+str(file), 'wb') as f:
        output.write(f):

The problem is: when i executing the script that is saving output file1 having 1 page, and output file2 having 2 pages, third file having three pages. But i need only first page from all PDF files. How to solve this.


Solution

  • You need to reset output for each file:

    for file in in_files:
        output = PdfFileWriter()  # clear output
        po = open(file, 'rb')
        rd = PdfFileReader(po, strict=False)
        for i in pages_to_keep:
            page = rd.getPage(i)
            output.addPage(page)
        with open(out_path+str(file), 'wb') as f:
            output.write(f):