I'm trying to write a function which splits a pdf into separate pages. From this SO answer. I copied a simple function which splits a pdf into separate pages:
def splitPdf(file_):
pdf = PdfFileReader(file_)
pages = []
for i in range(pdf.getNumPages()):
output = PdfFileWriter()
output.addPage(pdf.getPage(i))
with open("document-page%s.pdf" % i, "wb") as outputStream:
output.write(outputStream)
return pages
This however, writes the new PDFs to file, instead of returning a list of the new PDFs as file variables. So I changed the line of output.write(outputStream)
to:
pages.append(outputStream)
When trying to write the elements in the pages list however, I get a ValueError: I/O operation on closed file
.
Does anybody know how I can add the new files to the list and return them, instead of writing them to file? All tips are welcome!
It is not completely clear what you mean by "list of PDFs as file variables. If you want to create strings instead of files with PDF contents, and return a list of such strings, replace open()
with StringIO
and call getvalue()
to obtain the contents:
import cStringIO
def splitPdf(file_):
pdf = PdfFileReader(file_)
pages = []
for i in range(pdf.getNumPages()):
output = PdfFileWriter()
output.addPage(pdf.getPage(i))
io = cStringIO.StringIO()
output.write(io)
pages.append(io.getvalue())
return pages