Search code examples
pythonpython-3.xpyqtsubprocessqtwebkit

How to Clean Up subprocess.Popen Instances Upon Process Termination


I have a JavaScript application running on a Python / PyQt / QtWebKit foundation which creates subprocess.Popen objects to run external processes.

Popen objects are kept in a dictionary and referenced by an internal identifier so that the JS app can call Popen's methods via a pyqtSlot such as poll() to determine whether the process is still running or kill() to kill a rogue process.

If a process is not running any more, I would like to remove its Popen object from the dictionary for garbage collection.

What would be the recommended approach to cleaning up the dictionary automatically to prevent a memory leak ?

My ideas so far:

  • Call Popen.wait() in a thread per spawned process to perform an automatic cleanup right upon termination.
    PRO: Immediate cleanup, threads probably do not cost much CPU power as they should be sleeping, right ?
    CON: Many threads depending on spawning activity.
  • Use a thread to call Popen.poll() on all existing processes and check returncode if they have terminated and clean up in that case.
    PRO: Just one worker thread for all processes, lower memory usage.
    CON: Periodic polling necessary, higher CPU usage if there are many long-running processes or lots of processed spawned.

Which one would you choose and why ? Or any better solutions ?


Solution

  • For a platform-agnostic solution, I'd go with option #2, since the "CON" of high CPU usage can be circumvented with something like...

    import time
    
    # Assuming the Popen objects are in the dictionary values
    PROCESS_DICT = { ... }
    
    def my_thread_main():
        while 1:
            dead_keys = []
            for k, v in PROCESS_DICT.iteritems():
                v.poll()
                if v.returncode is not None:
                    dead_keys.append(k)
            if not dead_keys:
                time.sleep(1)  # Adjust sleep time to taste
                continue
            for k in dead_keys:
                del PROCESS_DICT[k]
    

    ...whereby, if no processes died on an iteration, you just sleep for a bit.

    So, in effect, your thread would still be sleeping most of the time, and although there's potential latency between a child process dying and its subsequent 'cleanup', it's really not a big deal, and this should scale better than using one thread per process.

    There are better platform-dependent solutions, however.

    For Windows, you should be able to use the WaitForMultipleObjects function via ctypes as ctypes.windll.kernel32.WaitForMultipleObjects, although you'd have to look into the feasibility.

    For OSX and Linux, it's probably easiest to handle the SIGCHLD asynchronously, using the signal module.

    A quick n' dirty example...

    import os
    import time
    import signal
    import subprocess
    
    # Map child PID to Popen object
    SUBPROCESSES = {}
    
    # Define handler
    def handle_sigchld(signum, frame):
        pid = os.wait()[0]
        print 'Subprocess PID=%d ended' % pid
        del SUBPROCESSES[pid]
    
    # Handle SIGCHLD
    signal.signal(signal.SIGCHLD, handle_sigchld)
    
    # Spawn a couple of subprocesses
    p1 = subprocess.Popen(['sleep', '1'])
    SUBPROCESSES[p1.pid] = p1
    p2 = subprocess.Popen(['sleep', '2'])
    SUBPROCESSES[p2.pid] = p2
    
    # Wait for all subprocesses to die
    while SUBPROCESSES:
        print 'tick'
        time.sleep(1)
    
    # Done
    print 'All subprocesses died'