Search code examples
pythonmultithreadingmultiprocessingtry-catch-finally

Python: multiprocessing.map: If one process raises an exception, why aren't other processes' finally blocks called?


My understanding is that finally clauses must *always* be executed if the try has been entered.

import random

from multiprocessing import Pool
from time import sleep

def Process(x):
  try:
    print x
    sleep(random.random())
    raise Exception('Exception: ' + x)
  finally:
    print 'Finally: ' + x

Pool(3).map(Process, ['1','2','3'])

Expected output is that for each of x which is printed on its own by line 8, there must be an occurrence of 'Finally x'.

Example output:

$ python bug.py 
1
2
3
Finally: 2
Traceback (most recent call last):
  File "bug.py", line 14, in <module>
    Pool(3).map(Process, ['1','2','3'])
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 225, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 522, in get
    raise self._value
Exception: Exception: 2

It seems that an exception terminating one process terminates the parent and sibling processes, even though there is further work required to be done in other processes.

Why am I wrong? Why is this correct? If this is correct, how should one safely clean up resources in multiprocess Python?


Solution

  • Short answer: SIGTERM trumps finally.

    Long answer: Turn on logging with mp.log_to_stderr():

    import random
    import multiprocessing as mp
    import time
    import logging
    
    logger=mp.log_to_stderr(logging.DEBUG)
    
    def Process(x):
        try:
            logger.info(x)
            time.sleep(random.random())
            raise Exception('Exception: ' + x)
        finally:
            logger.info('Finally: ' + x)
    
    result=mp.Pool(3).map(Process, ['1','2','3'])
    

    The logging output includes:

    [DEBUG/MainProcess] terminating workers
    

    Which corresponds to this code in multiprocessing.pool._terminate_pool:

        if pool and hasattr(pool[0], 'terminate'):
            debug('terminating workers')
            for p in pool:
                p.terminate()
    

    Each p in pool is a multiprocessing.Process, and calling terminate (at least on non-Windows machines) calls SIGTERM:

    from multiprocessing/forking.py:

    class Popen(object)
        def terminate(self):
            ...
                try:
                    os.kill(self.pid, signal.SIGTERM)
                except OSError, e:
                    if self.wait(timeout=0.1) is None:
                        raise
    

    So it comes down to what happens when a Python process in a try suite is sent a SIGTERM.

    Consider the following example (test.py):

    import time    
    def worker():
        try:
            time.sleep(100)        
        finally:
            print('enter finally')
            time.sleep(2) 
            print('exit finally')    
    worker()
    

    If you run it, then send it a SIGTERM, then the process ends immediately, without entering the finally suite, as evidenced by no output, and no delay.

    In one terminal:

    % test.py
    

    In second terminal:

    % pkill -TERM -f "test.py"
    

    Result in first terminal:

    Terminated
    

    Compare that with what happens when the process is sent a SIGINT (C-c):

    In second terminal:

    % pkill -INT -f "test.py"
    

    Result in first terminal:

    enter finally
    exit finally
    Traceback (most recent call last):
      File "/home/unutbu/pybin/test.py", line 14, in <module>
        worker()
      File "/home/unutbu/pybin/test.py", line 8, in worker
        time.sleep(100)        
    KeyboardInterrupt
    

    Conclusion: SIGTERM trumps finally.