Search code examples
multithreadingpython-2.7popen

Python subprocess hangs as Popen when piping output


I've been through dozens of the "Python subprocess hangs" articles here and think I've addressed all of the issues presented in the various articles in the code below.

My code intermittently hangs at the Popen command. I am running 4 threads using multiprocessing.dummy.apply_async, each of those threads starts a subprocess and then reads the output line by line and prints a modified version of it to stdout.

def my_subproc():
   exec_command = ['stdbuf', '-i0', '-o0', '-e0',
                    sys.executable, '-u',
                    os.path.dirname(os.path.realpath(__file__)) + '/myscript.py']

   proc = subprocess.Popen(exec_command, env=env, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1)
   print "DEBUG1", device

   for line in iter(proc.stdout.readline, b''):
    with print_lock:
        for l in textwrap.wrap(line.rstrip(), LINE_WRAP_DEFAULT):

The code above is run from apply_async:

pool = multiprocessing.dummy.Pool(4)
for i in range(0,4):
    pool.apply_async(my_subproc)

Intermittently the subprocess will hang at subprocess.Popen, the statement "DEBUG1" is not printed. Sometimes all threads will work, sometimes as few as 1 of the 4 will work.

I'm not aware that this exhibits any of the known deadlock situations for Popen. Am I wrong?


Solution

  • There is an insidious bug in subprocess.Popen() caused by io buffering of stdout (possibly stderr). There is a limit of around 65536 characters in the child process io buffer. If the child process writes enough output the child process will "hang" waiting for the buffer to be flushed - a deadlock situation. The authors of subprocess.py seem to believe this is a problem caused by the child, even though a subprocess.flush would be welcome. Pearson Anders pearson, https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/ Has a simple solution but you have to pay attention. As he says, "tempfile.TemporaryFile() is your friend." In my case I am running an application in a loop to batch process a bunch of files, the code for the solution is:

    with tempfile.TemporaryFile() as fout:
         sp.run(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], \
                timeout=cpto, check=True, stdout=fout, stderr=fout)
    

    The fix above still deadlocks after processing about 20 files. An improvement but not good enough, since I need to process hundreds of files in a batch. I came up with the below "crowbar" approach.

                    proc = sp.Popen(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], stdout=sp.PIPE, stderr=sp.STDOUT)   
                    """ Run GMAT for each file in batch.
                        Arguments:
                        -m: Start GMAT with a minimized interface.
                        -ns: Start GMAT without the splash screen showing.
                        -x: Exit GMAT after running the specified script.
                        -r: Automatically run the specified script after loading.
                    Note: The buffer passed to Popen() defaults to io.DEFAULT_BUFFER_SIZE, usually 62526 bytes.
                    If this is exceeded, the child process hangs with write pending for the buffer to be read.
                    https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/
                    """
                    try:
                        (outs, errors) = proc.communicate(cpto)
                        """Timeout in cpto seconds if process does not complete."""
    
                    except sp.TimeoutExpired as e:
                        logging.error('GMAT timed out in child process. Time allowed was %s secs, continuing', str(cpto))
    
                        logging.info("Process %s being terminated.", str(proc.pid))
                        proc.kill()
                        """ The child process is not killed by the system. """
    
                        (outs, errors) = proc.communicate()
                        """ And the stdout buffer must be flushed. """
    

    The basic idea is to kill the process and flush the buffer on each timeout. I moved the TimeoutExpired exception into the batch processing loop so that after killing the process, it continues with the next. This is harmless if the timeout value is sufficient to allow gmat to complete (albeit slower). I find that the code will process from 3 to 20 files before it times out.

    This really seems like a bug in subprocess.