Search code examples
pythonselecttimeoutpexpectpty

read with timeout from local process in pseudo terminal


I want to e. g. read the first line printed out by "tcpdump":

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

using "ptyprocess" (context: local process, terminal involved) and select() to wait for new data with a timeout:

import logging
from ptyprocess import PtyProcess
from select import select

logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s %(name)s %(message)s")

pty_process = PtyProcess.spawn(
    argv=["sudo", "tcpdump", "-w", "capture.pcap", "-i", "enp0s3"],
    echo=True)
while True:
    rlist, _, _ = select([pty_process.fd], [], [], 1)
    if pty_process.fd in rlist:
        try:
            data = pty_process.read(1)
        except EOFError:
            logging.debug("EOF")
            break
        logging.debug("read: %r", data)
    else:
        logging.debug("timeout")

For Python 3.x (tested with 3.6.10 and 3.8.1) this code reads the above mentioned line printed out by "tcpdump".

For Python 2.x (tested with 2.7.17) this code only reads the first character "t" and after that select() times out. I also observed, that for a first run, more than one character was read, but not all.

Tested on Debian 10.

How can I use select() with a timeout (or something similar) with "ptyprocess" to wait for new data, before I read the next character in Python 2?

Update 1:

strace shows the following difference:

Python 2:

select(6, [5], [], [], {tv_sec=1, tv_usec=0}) = 1 (in [5], left {tv_sec=0, tv_usec=999993})
read(5, "tcpdump: listening on enp0s3, li"..., 8192) = 86

Python 3:

select(6, [5], [], [], {tv_sec=1, tv_usec=0}) = 1 (in [5], left {tv_sec=0, tv_usec=999994})
read(5, "t", 1)                         = 1

I. e. for Python 2, read(..., 8192) is called and for Python 3, read(..., 1). How can I achieve, that for Python 2 also read(..., 1) is called?

Update 2:

The problem is independent from "tcpdump" and can also be reproduced like this:

import logging
from ptyprocess import PtyProcess
from select import select

logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s %(name)s %(message)s")

pty_process = PtyProcess.spawn(
    argv=["bash", "-c", "echo 123 ; sleep 3"],
    echo=True)
while True:
    rlist, _, _ = select([pty_process.fd], [], [], 1)
    if pty_process.fd in rlist:
        try:
            data = pty_process.read(1)
        except EOFError:
            logging.debug("EOF")
            break
        logging.debug("read: %r", data)
    else:
        logging.debug("timeout")

Python 2 output:

2020-04-23 12:51:27,126 root read: '1'
2020-04-23 12:51:28,193 root timeout
2020-04-23 12:51:29,204 root timeout
2020-04-23 12:51:30,129 root read: '2'
2020-04-23 12:51:30,129 root read: '3'
2020-04-23 12:51:30,129 root read: '\r'
2020-04-23 12:51:30,130 root read: '\n'
2020-04-23 12:51:30,130 root EOF

Python 3 output:

2020-04-23 12:51:23,106 root read: b'1'
2020-04-23 12:51:23,107 root read: b'2'
2020-04-23 12:51:23,107 root read: b'3'
2020-04-23 12:51:23,107 root read: b'\r'
2020-04-23 12:51:23,107 root read: b'\n'
2020-04-23 12:51:24,109 root timeout
2020-04-23 12:51:25,109 root timeout
2020-04-23 12:51:26,109 root EOF

Solution

  • PtyProcess.read() calls self.fileobj.read1(). PtyProcess.fileobj has type BufferedRWPair. BufferedRWPair.read1() delegates to BufferedRWPair.reader.read1(). The constructor of BufferedRWPair creates a BufferedReader object from the parameter reader.

    In Python 2.7.16 Modules/_io/bufferedio.c/buffered_read1() calls _bufferedreader_fill_buffer(self), which does:

    len = self->buffer_size - start;
    n = _bufferedreader_raw_read(self, self->buffer + start, len);
    

    In Python 3.8.1 Modules/_io/bufferedio.c/_io__Buffered_read1_impl() calls:

    r = _bufferedreader_raw_read(self, PyBytes_AS_STRING(res), n);
    

    In other words, in Python 3 BufferedReader.read1(n) raw-reads n bytes, whereas in Python 2 it reads more bytes to fill the buffer.

    It is not possible to use read(1), which works on the buffer, in combination with select(), which works on the underlying file descriptor, in the way the code posted in the question did.

    The following code, which uses pexpect instead of ptyprocess, allows to read with a timeout:

    import logging
    import pexpect
    
    logging.basicConfig(
        level=logging.DEBUG,
        format="%(asctime)s %(name)s %(message)s")
    
    child = pexpect.spawn("bash -c 'echo 123 ; sleep 3'")
    while True:
        try:
            data = child.read_nonblocking(size=1, timeout=1)
            logging.debug("read: %r", data)
        except pexpect.TIMEOUT:
            logging.debug("timeout")
        except pexpect.EOF:
            logging.debug("EOF")
            break
    

    Output:

    2020-04-26 14:54:56,006 root read: '1'
    2020-04-26 14:54:56,007 root read: '2'
    2020-04-26 14:54:56,007 root read: '3'
    2020-04-26 14:54:56,007 root read: '\r'
    2020-04-26 14:54:56,007 root read: '\n'
    2020-04-26 14:54:57,009 root timeout
    2020-04-26 14:54:58,010 root timeout
    2020-04-26 14:54:59,008 root EOF