Search code examples
pythonpython-3.xpypdf

PDF Page Split - Size of PDF increasing


I have created a PDF Splitter using PyPDF2. It splits PDFs that are more than 20Mb in size into multiple smaller PDFs.

The logic I am using is to split all the Pages into single Page PDFs, find each one's size. Add the sizes till 20 Mb is reached and split.

The problem that I am facing is that there are certain pages in a PDF which take almost the same size as the original PDF. Although when I do page extraction manually the size is about 500Kb.

Not sure why is the size increases. Please help me resolve these issues.

for i in range(pdf_reader.numPages):
    # New PDF with each page
    outputpdf = newpath + '\\' + pp.split('.pdf')[0] + 'page' + str(i+1) +'.pdf'

    #PDF Writer

    output = PyPDF2.PdfFileWriter()

    #Writing each page to PDF Writer

    output.addPage(pdf_reader.getPage(i))

    #Write into the new PDF

    with open(outputpdf, "wb") as outputStream:

         output.write(outputStream)

Solution

  • After multiple trials and errors, I was able to find an answer. I used pdfrw library to extract each page instead of PyPDF2 and I am not facing the same problem anymore.