Search code examples
pythoninputsubprocess

How can I read just one line from standard input, and pass the rest to a subprocess?


If you readline() from sys.stdin, passing the rest of it to a subprocess does not seem to work.

import subprocess
import sys

header = sys.stdin.buffer.readline()
print(header)
subprocess.run(['nl'], check=True)

(I'm using sys.stdin.buffer to avoid any encoding issues; this handle returns the raw bytes.)

This runs, but I don't get any output from the subprocess;

bash$ printf '%s\n' foo bar baz | python demo1.py
b'foo\n'

If I take out the readline etc, the subprocess reads standard input and produces the output I expect.

bash$ printf '%s\n' foo bar baz |
> python -c 'import subprocess; subprocess.run(["nl"], check=True)'
     1  foo
     2  bar
     3  baz

Is Python buffering the rest of stdin when I start reading it, or what's going on here? Running with python -u does not remove the problem (and indeed, the documentation for it only mentions that it changes the behavior for stdout and stderr). But if I pass in a larger amount of data, I do get some of it:

bash$ wc -l /etc/services
   13921 /etc/services

bash$ python demo1.py </etc/services  | head -n 3
     1     27/tcp     # NSW User System FE
     2  #                          Robert Thomas <[email protected]>
     3  #                28/tcp    Unassigned
 (... traceback from broken pipe elided ...)

bash$  fgrep -n 'NSW User System FE' /etc/services 
91:nsw-fe           27/udp     # NSW User System FE
92:nsw-fe           27/tcp     # NSW User System FE

bash$ sed -n '1,/NSW User System FE/p' /etc/services | wc
      91     449    4082

(So, looks like it eats 4096 bytes from the beginning.)

Is there a way I can avoid this behavior, though? I would like to only read one line off from the beginning, and pass the rest to the subprocess.

Calling sys.stdin.buffer.readline(-1) repeatedly in a loop does not help.

This is actually a follow-up for Read line from shell pipe, pass to exec, and keep to variable but I wanted to focus on this, to me, surprising aspect of the problem in that question.


Solution

  • This is because sys.stdin is created using the built-in open function in the default buffered mode, which uses a buffer of size io.DEFAULT_BUFFER_SIZE, which on most systems is either 4096 or 8192 bytes.

    To make the parent process consume precisely one line of text from the standard input, you can therefore open it with the buffer disabled by passing 0 as the buffering argument to the open or os.fdopen function:

    # subp1.py
    import os
    import sys
    import subprocess
    
    # or with the platform-dependent device file:
    # unbuffered_stdin = open('/dev/stdin', 'rb', buffering=0)
    unbuffered_stdin = os.fdopen(sys.stdin.fileno(), 'rb', buffering=0)
    
    print(unbuffered_stdin.readline())
    subprocess.run(['nl'], check=True)
    

    so that:

    printf "foo\nbar\n" | python subp1.py
    

    would then output:

    b'foo\n'
         1  bar