I need to run a function in a process, which is completely isolated from all other memory, several times. I would like to use multiprocessing
for that (since I need to serialize a complex output coming from the functions). I set the start_method
to 'spawn'
and use a pool with maxtasksperchild=1
. I would expect to get a different process for each task, and therefore see a different PID:
import multiprocessing
import time
import os
def f(x):
print("PID: %d" % os.getpid())
time.sleep(x)
complex_obj = 5 #more complex axtually
return complex_obj
if __name__ == '__main__':
multiprocessing.set_start_method('spawn')
pool = multiprocessing.Pool(4, maxtasksperchild=1)
pool.map(f, [5]*30)
pool.close()
However the output I get is:
$ python untitled1.py
PID: 30010
PID: 30009
PID: 30012
PID: 30011
PID: 30010
PID: 30009
PID: 30012
PID: 30011
PID: 30018
PID: 30017
PID: 30019
PID: 30020
PID: 30018
PID: 30019
PID: 30017
PID: 30020
...
So the processes are not being respawned after every task. Is there an automatic way of getting a new PID each time (ie without starting a new pool for each set of processes)?
You need to also specify chunksize=1
in the call to pool.map
. Otherwise, multiple items in your iterable get bundled together into one "task" from the perception of the worker processes:
import multiprocessing
import time
import os
def f(x):
print("PID: %d" % os.getpid())
time.sleep(x)
complex_obj = 5 #more complex axtually
return complex_obj
if __name__ == '__main__':
multiprocessing.set_start_method('spawn')
pool = multiprocessing.Pool(4, maxtasksperchild=1)
pool.map(f, [5]*30, chunksize=1)
pool.close()
Output doesn't have repeated PIDs now:
PID: 4912
PID: 4913
PID: 4914
PID: 4915
PID: 4938
PID: 4937
PID: 4940
PID: 4939
PID: 4966
PID: 4965
PID: 4970
PID: 4971
PID: 4991
PID: 4990
PID: 4992
PID: 4993
PID: 5013
PID: 5014
PID: 5012