Search code examples
pythonmultiprocessingpool

How does multiprocessing Pool(processes=n) handle excessive Pool requests?


Assume I have code like the following--where I need to run a function (with different params) x times. However, my Pool count is less--e.g. x/2.

args_list = []

# Range(8) is just an example. My args are more complex.
for r in Range(8):
    args_list.append(r)

with Pool(4) as proc_pool:
    results = proc_pool.map(my_func, args_list)
    proc_pool.close()
    proc_pool.join()

Will Pool only try to process 4 at a time--then move on to the next 4, or will all 8 be processed at once--but only in 4 Pools?

If Pool will try to process all 8 in 4 Pools at once, what is the best way to handle this? (I can put the with Pool code in a loop to only use 4 Pools at once.)

I read the documentation, but it was not clear to me.


Solution

  • The number passed in Pool's first argument is the number of worker processes in the pool (in this case 4). The map function will run on each argument. Each time a worker finished it's available to be used to run another argument.

    To illustrate this, consider the following:

    import time
    def my_func(r):
       if r == 1:
           time.sleep(120)
       return r * r
    

    The first thing that will happen is that 4 runs will be sent the workers. All of them will finish almost immediately, except for the one which r == 1. As the workers finish, the worker is re-used for another input. So, in the example, 7 of the workers will finish almost immediately but the last one will take about 2 minutes. Since the map function will wait until all workers finish to return the results, the map function will take 2 minutes to finish.

    To give another example:

    import time
    def my_func(r):
       if r in (1, 3, 5, 7):
           time.sleep(120)
       return r * r
    

    Half of the runs will complete almost instantly, where 4 of the runs will take 2 minutes. If five of the runs would take 2 minutes (say for r in (1, 2, 3, 5, 7)), the total time would be 4 minutes, since for 2 minutes 4 processes would be waiting and for 2 minutes 1 process would be waiting.