Search code examples
pythonmultiprocessingconcurrent.futures

ProcessPoolExecutor logging fails to log inside function on Windows but not on Unix / Mac


When I run the following script on a Windows computer, I do not see any of the log messages from the log_pid function, however I do when I run on Unix / Mac. I've read before that multiprocessing is different on Windows compared to Mac, but it's not clear to me what changes should I make to get this script to work on Windows. I'm running Python 3.6.

import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os


def log_pid(x):
    logger.info('Executing on process: %s' % os.getpid())


def do_stuff():
    logger.info('this is the do stuff function.')
    with ProcessPoolExecutor(max_workers=4) as executor:
        executor.map(log_pid, range(0, 10))


def main():
    logger.info('this is the main function.')
    do_stuff()


if __name__ == '__main__':
    logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
    logger = logging.getLogger(__name__)

    logger.info('Start of script ...')

    main()

    logger.info('End of script ...')

Solution

  • Unix processes are created via the fork strategy where the child gets cloned from the parent and continues its execution right at the moment the parent forked.

    On Windows is quite different: a blank process is created and a new Python interpreter gets launched. The interpreter will then load the module where the log_pid function is located and execute it.

    This means the __main__ section is not executed by the newly spawned child process. Hence, the logger object is not created and the log_pid function crashes accordingly. You don't see the error because you ignore the result of your computation. Try to modify the logic as follows.

    def do_stuff():
        logger.info('this is the do stuff function.')
        with ProcessPoolExecutor(max_workers=4) as executor:
            iterator = executor.map(log_pid, range(0, 10))
            list(iterator)  # collect the results in a list
    

    And the issue will become evident.

    Traceback (most recent call last):
      File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 175, in _process_worker
        r = call_item.fn(*call_item.args, **call_item.kwargs)
      File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in _process_chunk
        return [fn(*args) for args in chunk]
      File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in <listcomp>
        return [fn(*args) for args in chunk]
      File "C:\Users\cafama\Desktop\pool.py", line 8, in log_pid
        logger.info('Executing on process: %s' % os.getpid())
    NameError: name 'logger' is not defined
    

    When dealing with process pools (whether concurrent.futures or multiprocessing ones) always collect the result of the computation to avoid silent bugs to cause confusion.

    To fix the problem, just move the logger creation at the top level of the module and everything will work on all platforms.

    import logging
    import sys
    from concurrent.futures import ProcessPoolExecutor
    import os
    
    logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
    logger = logging.getLogger(__name__)   
    
    def log_pid(x):
        logger.info('Executing on process: %s' % os.getpid())
    
    ...