Assume I have code like the following--where I need to run a function (with different params) x times. However, my Pool count is less--e.g. x/2.
args_list = []
# Range(8) is just an example. My args are more complex.
for r in Range(8):
args_list.append(r)
with Pool(4) as proc_pool:
results = proc_pool.map(my_func, args_list)
proc_pool.close()
proc_pool.join()
Will Pool
only try to process 4 at a time--then move on to the next 4, or will all 8 be processed at once--but only in 4 Pools?
If Pool will try to process all 8 in 4 Pools at once, what is the best way to handle this? (I can put the with Pool
code in a loop to only use 4 Pools at once.)
I read the documentation, but it was not clear to me.
The number passed in Pool
's first argument is the number of worker processes in the pool (in this case 4). The map function will run on each argument. Each time a worker finished it's available to be used to run another argument.
To illustrate this, consider the following:
import time
def my_func(r):
if r == 1:
time.sleep(120)
return r * r
The first thing that will happen is that 4 runs will be sent the workers. All of them will finish almost immediately, except for the one which r == 1
. As the workers finish, the worker is re-used for another input. So, in the example, 7 of the workers will finish almost immediately but the last one will take about 2 minutes. Since the map function will wait until all workers finish to return the results, the map function will take 2 minutes to finish.
To give another example:
import time
def my_func(r):
if r in (1, 3, 5, 7):
time.sleep(120)
return r * r
Half of the runs will complete almost instantly, where 4 of the runs will take 2 minutes. If five of the runs would take 2 minutes (say for r in (1, 2, 3, 5, 7)
), the total time would be 4 minutes, since for 2 minutes 4 processes would be waiting and for 2 minutes 1 process would be waiting.