When using multiprocessing on different platforms, the way sys.modules
is handled seems to be different. If you modify the sys.modules
dict on Linux, and spawn a child process, that process seems inherit the dict, but in Macos a new object is created for sys.modules
. Is this because the way multiprocessing works is not OS-agnostic?
Is this due to Linux using fork()
when creating a child process and Macos doing it differently?
For example, if I try this snippet:
import multiprocessing
import sys
import time
from types import ModuleType
def worker():
# print the id of the dict
print('worker', id(sys.modules), 'foo' in sys.modules)
def main():
# modify the sys.modules dict
sys.modules['foo'] = ModuleType('foo')
# print the id of the dict
print('main', id(sys.modules), 'foo' in sys.modules)
p = multiprocessing.Process(target=worker, daemon=True)
p.start()
time.sleep(0.1)
if __name__ == '__main__':
main()
I get these results:
Linux (x86_64, python 3.10.1)
main 139897509461248 True
worker 139897509461248 True
Macos (M1 ARM, python 3.10.1)
main 4307217856 True
worker 4334595520 False
For context: I am trying to build a task queue using just the stdlib (ala Celery). However, I could not find an idiomatic way to pass pickled functions to worker processes, but in most cases, I get that attribute lookup fails when unpickling the serialized functions (when serializing callables, pickle just stores the name and module, and this is loaded at runtime when deserializing).
A workaround I'm trying is to change the function's __module__
attribute when serializing the function, and adding it to a dynamically created module. This seems to fix the attribute lookup error, but feels like a bad hack.
On MacOS new multi-processing processes are spawned and not forked by default. This is less efficient and includes a re-import of packages and modules.
From the docs:
Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725.
You can try multiprocessing.set_start_method()
at your own risk.