Search code examples
pythonpython-3.xpypdf

PyPDF2: writing output to stdout fails with python3


I am trying to use Python 3.7.2 with PyPDF2 1.26 to select some pages of an input PDF file and write the output to stdout (the actual code is more complicated, this is just a MCVE):

import sys
from PyPDF2 import PdfFileReader, PdfFileWriter

input = PdfFileReader("example.pdf")
output = PdfFileWriter()
output.addPage(input.getPage(0))

output.write(sys.stdout)

This fails with the following error:

UserWarning: File <<stdout>> to write to is not in binary mode. It may not be written to correctly. [pdf.py:453]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/site-packages/PyPDF2/pdf.py", line 487, in write
    stream.write(self._header + b_("\n"))
TypeError: write() argument must be str, not bytes

The problem seems to be that sys.stdout is not open in binary mode. As some of the answers suggest, I have tried the following:

output.write(sys.stdout.buffer)

This fails with the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/site-packages/PyPDF2/pdf.py", line 491, in write
    object_positions.append(stream.tell())
OSError: [Errno 29] Illegal seek

I have also tried the answer from Changing the way stdin/stdout is opened in Python 3:

sout = open(sys.stdout.fileno(), "wb")
output.write(sout)

This fails with the same error as above.

How can I use the PyPDF2 library to output a PDF to standard output?

More generally, how do I correctly switch sys.stdout to binary mode (akin to Perl's binmode STDOUT)?

Note: There is no need to tell me that I can open a file in binary mode and write the PDF to that file. That works; however, I specifically want to write the PDF to stdout.


Solution

  • From the documentation:

    write(stream)

    Writes the collection of pages added to this object out as a PDF file.

    Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.

    It turns out that sys.stdout.buffer is not tellable if not redirected to a file, hence you can't use it as a stream for PdfFileWriter.write.

    Say your script is called myscript. If you call just myscript, then you'll get this error, but if you use it with a redirection, as in:

    myscript > myfile.pdf
    

    then Python understands it's a seekable stream, and you won't get the error.