with open(pdf,'rb') as fin:
reader = PyPDF2.PdfFileReader(fin)
new_pdf = PyPDF2.PdfFileWriter()
for i in range(reader.numPages):
new_pdf.addPage(reader.getPage(i))
out_file = pdf if not create_copy else self._new_copy(pdf)
with open(out_file,'wb') as fout:
new_pdf.write(fout)
This works as intended when writing a copy.
Now let's move the last three lines out of the with
:
with open(pdf,'rb') as fin:
reader = PyPDF2.PdfFileReader(fin)
new_pdf = PyPDF2.PdfFileWriter()
for i in range(reader.numPages):
new_pdf.addPage(reader.getPage(i))
out_file = pdf if not create_copy else self._new_copy(pdf)
with open(out_file,'wb') as fout:
new_pdf.write(fout)
This creates a pdf with the correct amount of pages, but all the pages are blank, even when writing to a new file. (note that moving the new_pdf = ...
out, too, doesn't change anything)
Why? And what can I do about it? Because I expect to have to move these three lines out of the first with
, eventually, in order to provide overwriting support. (Unless I just create a copy anyway and then rename, which I kind of want to avoid.)
This is kind of a wild guess, as I am not familiar with the module and did not bother to read the source code.
However, from the documentation, it seems that PdfFileWriter.addPage
expects a PageObject
, which has a reference to the PDF file the page belongs to. So my guess is that addPage
does not immediately create a copy of, but just a reference to the page in the original PDF, and when that file is closed before the new PDF has been written, the content of that page is lost.