Search code examples
pythonpython-3.xfile-iowith-statementpypdf

Why need with statements for reader and writer be nested?


with open(pdf,'rb') as fin:
    reader = PyPDF2.PdfFileReader(fin)
    new_pdf = PyPDF2.PdfFileWriter()

    for i in range(reader.numPages):
        new_pdf.addPage(reader.getPage(i))

    out_file = pdf if not create_copy else self._new_copy(pdf)
    with open(out_file,'wb') as fout:
        new_pdf.write(fout)

This works as intended when writing a copy.

Now let's move the last three lines out of the with:

with open(pdf,'rb') as fin:
    reader = PyPDF2.PdfFileReader(fin)
    new_pdf = PyPDF2.PdfFileWriter()

    for i in range(reader.numPages):
        new_pdf.addPage(reader.getPage(i))

out_file = pdf if not create_copy else self._new_copy(pdf)
with open(out_file,'wb') as fout:
    new_pdf.write(fout)

This creates a pdf with the correct amount of pages, but all the pages are blank, even when writing to a new file. (note that moving the new_pdf = ... out, too, doesn't change anything)

Why? And what can I do about it? Because I expect to have to move these three lines out of the first with, eventually, in order to provide overwriting support. (Unless I just create a copy anyway and then rename, which I kind of want to avoid.)


Solution

  • This is kind of a wild guess, as I am not familiar with the module and did not bother to read the source code.

    However, from the documentation, it seems that PdfFileWriter.addPage expects a PageObject, which has a reference to the PDF file the page belongs to. So my guess is that addPage does not immediately create a copy of, but just a reference to the page in the original PDF, and when that file is closed before the new PDF has been written, the content of that page is lost.