My script generates PDF (PyPDF2.pdf.PdfFileWriter object
) and stores it in the variable.
I need to work with it as file-like object
further in script. But now I have to write it to HDD first. Then I have to open it as file to work with it.
To prevent this unnecessary writing/reading operations I found many solutions - StringIO
, BytesIO
and so on. But I cannot find what exactly can help me in my case.
As far as I understand - I need to "convert" (or write to RAM) PyPDF2.pdf.PdfFileWriter object
to file-like object
to work directly with it.
Or there is another method that fits exactly to my case?
UPDATE - here is code-sample
from pdfrw import PdfReader, PdfWriter, PageMerge
from PyPDF2 import PdfFileReader, PdfFileWriter
red_file = PdfFileReader(open("file_name.pdf", 'rb'))
large_pages_indexes = [1, 7, 9]
large = PdfFileWriter()
for i in large_pages_indexes:
p = red_file.getPage(i)
large.addPage(p)
# here final data have to be written (I would like to avoid that)
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
# here I need to read exported "virtual_file.pdf" (I would like to avoid that too)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp) # here I'm starting to work with this file using another module "pdfrw"
print(pdf)
To avoid slow disk I/O it appears you want to replace
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp)
with
buf = io.BytesIO()
large.write(buf)
buf.seek(0)
pdf = PdfReader(buf)
Also, buf.getvalue()
is available to you.