python logging multiprocessing python-multiprocessing python-logging

Logger does not inherit config from parent process

Consider the following minimal setup:

/mymodule
├── __init__.py
├── main.py
└── worker.py

__init__.py is empty

main.py:

import sys
import logging
import multiprocessing
from test.worker import do_stuff

logging.basicConfig(
    format='[%(name)s] [%(levelname)s]: %(message)s',
    level=logging.DEBUG,
)

logger = logging.getLogger(__name__)

def main():

    logger.debug('I am main. I manage workers')
    logger.info('I am main. I manage workers')
    logger.warning('I am main. I manage workers')

    p = multiprocessing.Process(target=do_stuff)
    p.start()

if __name__ == '__main__':
    sys.exit( main() )

worker.py:

import logging

logger = logging.getLogger(__name__)

def do_stuff():
    logger.debug(f'I am a worker. I do stuff')
    logger.info(f'I am a worker. I do stuff')
    logger.error(f'I am a worker. I do stuff')
    logger.error(f'Here is my logger: {logger}')

If I run python -m mymodule.main I get (as expected):

[__main__] [DEBUG]: I am main. I manage workers
[__main__] [INFO]: I am main. I manage workers
[__main__] [WARNING]: I am main. I manage workers
[mymodule.worker] [DEBUG]: I am a worker. I do stuff
[mymodule.worker] [INFO]: I am a worker. I do stuff
[mymodule.worker] [ERROR]: I am a worker. I do stuff
[mymodule.worker] [ERROR]: Here is my logger: <Logger mymodule.worker (DEBUG)>

But if I just rename /mymodule/main.py to mymodule/__main__.py and then run one of these commands: python -m mymodule.__main__ or python -m mymodule, I get this:

[__main__] [DEBUG]: I am main. I manage workers
[__main__] [INFO]: I am main. I manage workers
[__main__] [WARNING]: I am main. I manage workers
I am a worker. I do stuff
Here is my logger: <Logger mymodule.worker (WARNING)>

It is pretty clear to me that in the second case logger of mymodule.worker did not inherit configuration done by logging.basicConfig. But why? No change of code, just changed file name from main.py to __main__.py.

I would like to have __main__.py to be able to easily run the module by its name and also to keep logging.basicConfig that will be inhertied by imported submodules and propagate correctly to subprocesses.

Solution

You're on Windows, which doesn't support fork. That means that multiprocessing has to create workers by spawning a fresh Python process, in which your logging config hasn't happened.

Now, people often want to use functions and classes they defined in their __main__ module from a worker. But they don't want launching a worker to effectively rerun their whole script. So when multiprocessing spawns a new worker process without forking, it does a sort of "pseudo-import" of __main__.

The worker runs your main script under the name __mp_main__ instead of __main__, to try to get any function and class definitions set up, without running anything under an if __name__ == '__main__' guard. Then it installs the results of doing that as the new process's __main__ module, before continuing on with the worker's task.

But __main__.py files often don't use if __name__ == '__main__' guards. If multiprocessing did this for a file without an if __name__ == '__main__' guard, it'd effectively rerun the whole script, which multiprocessing was trying to avoid. They could just document "hey, use the guard anyway", but they made a different choice.

If multiprocessing detects that the main script was a package's __main__.py file, it just skips the whole pseudo-import thing completely.

(Note that the skip logic doesn't kick in if you run a __main__.py file directly by file path - that goes through a different function.)

So when your file is named main.py, and you do python -m mymodule.main, multiprocessing "pseudo-imports" your file as __mp_main__ in the worker, and doing that re-runs your logging.basicConfig call.

But when your file is named __main__.py, multiprocessing decides the whole "pseudo-import" thing would be too risky. It doesn't run anything from your __main__.py in the workers, so your workers don't do logging configuration.

If you want to see the code that handles all this, it's in Lib/multiprocessing/spawn.py, particularly in _fixup_main_from_name:

# Multiprocessing module helpers to fix up the main module in
# spawned subprocesses
def _fixup_main_from_name(mod_name):
    # __main__.py files for packages, directories, zip archives, etc, run
    # their "main only" code unconditionally, so we don't even try to
    # populate anything in __main__, nor do we make any changes to
    # __main__ attributes
    current_main = sys.modules['__main__']
    if mod_name == "__main__" or mod_name.endswith(".__main__"):
        return

    # If this process was forked, __main__ may already be populated
    if getattr(current_main.__spec__, "name", None) == mod_name:
        return

    # Otherwise, __main__ may contain some non-main code where we need to
    # support unpickling it properly. We rerun it as __mp_main__ and make
    # the normal __main__ an alias to that
    old_main_modules.append(current_main)
    main_module = types.ModuleType("__mp_main__")
    main_content = runpy.run_module(mod_name,
                                    run_name="__mp_main__",
                                    alter_sys=True)
    main_module.__dict__.update(main_content)
    sys.modules['__main__'] = sys.modules['__mp_main__'] = main_module