Search code examples
pythonmultithreadingsubprocesspython-asyncioreal-time

subprocess get result before terminate


Get Result of a SubProcess in Real Time

I would like to get each result (sys.stdout) in real time before the subprocess terminates. Suppose we have the following file.py.

import time,sys
sys.stdout.write('something')
while True:
    sys.stdout.write('something else')
    time.sleep(4)

Well, i made some tries with modules of subprocess, asyncio and threading, although all methods gives me the result when the process is finished. Ideally i would like to terminate the process myself and get each result (stdout, stderr) in real time and not when the process it completes.

import subprocess
proc = sp.Popen([sys.executable, "/Users/../../file.py"], stdout = subprocess.PIPE, stderr= subproces.STDOUT)
proc.communicate() #This one received the result after finish

I tried as well with readline proc.stdout.readline() in a different thread with threading module and with asyncio, but it waits as well until the process completes.

The only usefull that i found is the usage of psutil.Popen(*args, **kwargs) with this one i could terminate whenever i want the process and get some stats for that. But the main issue still remains to get in real time (asynchronously) each sys.stdout or print of file.py, at the moment of each printing.

*preferred solution for python3.6


Solution

  • As noted in the comments, the first and foremost thing is to ensure that your file.py program actually writes the data the way you think it does.

    For example, the program you have shown will write nothing for about 40 minutes, because that's how long it takes for 14-byte prints issued at 4-second intervals to fill up the 8-kilobyte IO buffer. Even more confusingly, some programs will appear to write data if you test them on a TTY (i.e. just run them), but not when you start them as subprocesses. This is because on a TTY stdout is line-buffered, and on a pipe it is fully buffered. When the output is not flushed, there is simply no way for another program to detect the output because it is stuck inside the subprocess's buffer that it never bothered to share with anyone.

    In other words, don't forget to flush:

    while True:
        # or just print('something else', flush=True)
        sys.stdout.write('something else')
        sys.stdout.flush()
        time.sleep(4)
    

    With that out of the way, let's examine how to read that output. Asyncio provides a nice stream-based interface to subprocesses that is quite capable of accessing arbitrary output as it arrives. For example:

    import asyncio
    
    async def main():
        loop = asyncio.get_event_loop()
        proc = await asyncio.create_subprocess_exec(
            "python", "file.py",
            stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
        )
        # loop.create_task() rather than asyncio.create_task() because Python 3.6
        loop.create_task(display_as_arrives(proc.stdout, 'stdout'))
        loop.create_task(display_as_arrives(proc.stderr, 'stderr'))
        await proc.wait()
    
    async def display_as_arrives(stream, where):
        while True:
            # 1024 chosen arbitrarily - StreamReader.read will happily return
            # shorter chunks - this allows reading in real-time.
            output = await stream.read(1024)
            if output == b'':
                break
            print('got', where, ':', output)
    
    # run_until_complete() rather than asyncio.run() because Python 3.6
    asyncio.get_event_loop().run_until_complete(main())