Search code examples
pythonpdfiooutputstreampypdf

How to "write to variable" instead of "to file" in Python


I'm trying to write a function which splits a pdf into separate pages. From this SO answer. I copied a simple function which splits a pdf into separate pages:

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        with open("document-page%s.pdf" % i, "wb") as outputStream:
            output.write(outputStream)
    return pages

This however, writes the new PDFs to file, instead of returning a list of the new PDFs as file variables. So I changed the line of output.write(outputStream) to:

pages.append(outputStream)

When trying to write the elements in the pages list however, I get a ValueError: I/O operation on closed file.

Does anybody know how I can add the new files to the list and return them, instead of writing them to file? All tips are welcome!


Solution

  • It is not completely clear what you mean by "list of PDFs as file variables. If you want to create strings instead of files with PDF contents, and return a list of such strings, replace open() with StringIO and call getvalue() to obtain the contents:

    import cStringIO
    
    def splitPdf(file_):
        pdf = PdfFileReader(file_)
        pages = []
        for i in range(pdf.getNumPages()):
            output = PdfFileWriter()
            output.addPage(pdf.getPage(i))
            io = cStringIO.StringIO()
            output.write(io)
            pages.append(io.getvalue())
        return pages