I'm struggling to find answers on what objects and variables are copied to child processes when creating a multiprocessing pool in Python 3.
In other words, say I have a huge list (~230000000 elements) stored in a class that implements a function that uses a pool of four child processes. Will this list then be copied across to all four child processes if...
Note: this answer is partial in the sense that I too couldn't (yet) find written evidence and documentation about this, but the following gives some kind of empirical data, if you will.
The following code is used to demonstrate how data is being passed/copied to child processes using a Pool
(the actual list l
is not used on purpose in the map
to allow clean printings):
from multiprocessing import Pool
import os
def process(x):
print(os.getpid(), __name__, 'l' in globals())
# A - l = list(range(100000))
if __name__ == "__main__":
# B - l = list(range(100000))
with Pool() as pool:
pool.map(process, [1,2,3,4])
print(os.getpid(), __name__, 'l' in globals())
When uncommenting comment A
, a printout similar to:
19604 __mp_main__ True
6392 __mp_main__ True
19604 __mp_main__ True
7048 __mp_main__ True
6568 __main__ True
will be given. This is because the list is defined outside the __name__
guard, and as the processes in Windows basically import
the py file, they all define their own version of l
.
When uncommenting comment B
, a printout similar to:
7248 __mp_main__ False
22644 __mp_main__ False
22676 __mp_main__ False
16520 __mp_main__ False
19736 __main__ True
will be given. i.e. as the the list is defined inside the __name__
guard, only the __main__
process have it defined and it passes the arguments through map
to the different processes.
Uncommenting any of the comments will give a printout similar to:
25261 __main__ True
25262 __main__ True
25263 __main__ True
25264 __main__ True
25260 __main__ True
I am guessing that this is because Linux uses fork
to create the spawned processes, where the processes are being "cloned" so the list will be defined either way.