Search code examples
pythonpython-2.7python-3.xmultiprocessingdaemon

Why doesn't the daemon program exit without join()


The answer might be right in front of me on the link below but I still don't understand. I'm sure after someone explains this to me, Darwin will be making a call to me.

The example is at this link here, although I've made some changes to try to experiment and help my understanding.

Here's the code:

import multiprocessing
import time
import sys

def daemon():
    p = multiprocessing.current_process()
    print 'Starting: ', p.name, p.pid
    sys.stdout.flush()
    time.sleep(2)
    print 'Exiting: ', p.name, p.pid
    sys.stdout.flush()

def non_daemon():
    p = multiprocessing.current_process()
    print 'Starting: ', p.name, p.pid
    sys.stdout.flush()
    time.sleep(6)
    print 'Exiting: ', p.name, p.pid
    sys.stdout.flush()

if __name__ == '__main__':
    d = multiprocessing.Process(name='daemon', target=daemon)
    d.daemon = True

    n = multiprocessing.Process(name='non-daemon', target=non_daemon)
    n.daemon = False

    d.start()
    time.sleep(1)
    n.start()
#    d.join()

And the output of the code is:

Starting:  daemon 6173
Starting:  non-daemon 6174
Exiting:  non-daemon 6174

If the join() at the end is uncommented, then the output is:

Starting:  daemon 6247
Starting:  non-daemon 6248
Exiting:  daemon 6247
Exiting:  non-daemon 6248

I'm confused b/c the sleep of the daemon is 2 sec, whereas the non-daemon is 6 sec. Why doesn't it print out the "Exiting" message in the first case? The daemon should have woken up before the non-daemon and printed the message.

The explanation from the site is as such:

The output does not include the “Exiting” message from the daemon process, since all of the non-daemon processes (including the main program) exit before the daemon process wakes up from its 2 second sleep.

but I changed it such that the daemon should have woken up before the non-daemon does. What am I missing here? Thanks in advance for your help.

EDIT: Forgot to mention I'm using python 2.7 but apparently this problem is also in python 3.x


Solution

  • This was a fun one to track down. The docs are somewhat misleading, in that they describe the non-daemon processes as if they are all equivalent; the existence of any non-daemon process means the process "family" is alive. But that's not how it's implemented. The parent process is "more equal" than others; multiprocessing registers an atexit handler that does the following:

    for p in active_children():
        if p.daemon:
            info('calling terminate() for daemon %s', p.name)
            p._popen.terminate()
    
    for p in active_children():
        info('calling join() for process %s', p.name)
        p.join()
    

    So when the main process finishes, it first terminates all daemon child processes, then joins all child processes to wait on non-daemon children and clean up resources from daemon children.

    Because it performs cleanup in this order, a moment after your non-daemon Process starts, the main process begins cleanup and forcibly terminates the daemon Process.

    Note that fixing this can be as simple as joining the non-daemon process manually, not just joining the daemon process (which defeats the whole point of a daemon completely); that prevents the atexit handler from being called, delaying the cleanup that would terminate the daemon child.

    It's arguably a bug (one that seems to exist up through 3.5.1; I reproed myself), but whether it's a behavior bug or a docs bug is arguable.