Search code examples
pythonpdfextractpypdf

Extract specific pages of PDF and save it with Python


I have some sources and tried to code which extract some pages and create pdf files. I have a list which looks like this

information = [(filename1,startpage1,endpage1), (filename2, startpage2, endpage2), ...,(filename19,startpage19,endpage19)].

This is my code.

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfFileReader("example.pdf")

for page in range(reader.getNumPages() - 1):
    writer = PdfFileWriter()
    start = information[page][1]
    end = information[page][2]
    while start < end:
        writer.addPage(reader.getPage(start))
        start += 1
        output_filename = "{}_{}_page_{}.pdf".format(
            information[page][0], information[page][1], information[page][2]
        )
    with open(output_filename, "wb") as out:
        writer.write(out)

But the output is weird.. some has nothing inside and some has just one page in it. How can I correct this?


Solution

  • I have fixed the issue. it was the equal sign (start<=end).

    for page in range(len(information)):
        pdf_writer = PyPDF2.PdfFileWriter()
        start = information[page][1]
        end = information[page][2]
        while start<=end:
            pdf_writer.addPage(pdfReader.getPage(start-1))
            start+=1
        if not os.path.exists(savepath):
            os.makedirs(savepath)
        output_filename = '{}_{}_page_{}.pdf'.format(information[page][0],information[page][1], information[page][2])
        with open(output_filename,'wb') as out:
            pdf_writer.write(out)