Search code examples
pythonprocesspython-multiprocessing

Why does the child process inherit `sys.modules` when forking processes Linux but not in Macos


When using multiprocessing on different platforms, the way sys.modules is handled seems to be different. If you modify the sys.modules dict on Linux, and spawn a child process, that process seems inherit the dict, but in Macos a new object is created for sys.modules. Is this because the way multiprocessing works is not OS-agnostic?

Is this due to Linux using fork() when creating a child process and Macos doing it differently?

For example, if I try this snippet:

import multiprocessing
import sys
import time
from types import ModuleType


def worker():
    # print the id of the dict
    print('worker', id(sys.modules), 'foo' in sys.modules)


def main():
    # modify the sys.modules dict
    sys.modules['foo'] = ModuleType('foo')

    # print the id of the dict
    print('main', id(sys.modules), 'foo' in sys.modules)

    p = multiprocessing.Process(target=worker, daemon=True)
    p.start()

    time.sleep(0.1)


if __name__ == '__main__':
    main()

I get these results:

Linux (x86_64, python 3.10.1)

main 139897509461248 True
worker 139897509461248 True

Macos (M1 ARM, python 3.10.1)

main 4307217856 True
worker 4334595520 False

For context: I am trying to build a task queue using just the stdlib (ala Celery). However, I could not find an idiomatic way to pass pickled functions to worker processes, but in most cases, I get that attribute lookup fails when unpickling the serialized functions (when serializing callables, pickle just stores the name and module, and this is loaded at runtime when deserializing).

A workaround I'm trying is to change the function's __module__ attribute when serializing the function, and adding it to a dynamically created module. This seems to fix the attribute lookup error, but feels like a bad hack.


Solution

  • On MacOS new multi-processing processes are spawned and not forked by default. This is less efficient and includes a re-import of packages and modules.

    From the docs:

    Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725.

    You can try multiprocessing.set_start_method() at your own risk.