Search code examples
pythonmultiprocessingpython-multiprocessing

Python multiprocessing.Pool.map behavior when list is longer than number of processes


When submitting a list of tasks that is longer than the number of processes, how are the processes assigned to these tasks?

from multiprocessing import Pool

def f(i):
    print(i)
    return i

with Pool(2) as pool:
    print(pool.map(f, [1, 2, 3, 4, 5]))

I'm running a more complex function and the execution doesn't seem to be in order (FIFO).


Solution

  • Here's some sample code:

    from multiprocessing import Pool
    from time import sleep
    
    
    def f(x):
        print(x)
        sleep(0.1)
        return x * x
    
    
    if __name__ == '__main__':
        with Pool(2) as pool:
            print(pool.map(f, range(100)))
    

    Which prints out:

    0
    13
    1
    14
    2
    15
    3
    16
    4
    ...
    

    If we look into the relevant source code in multiprocessing:

        def _map_async(self, func, iterable, mapper, chunksize=None, callback=None,
                error_callback=None):
            '''
            Helper function to implement map, starmap and their async counterparts.
            '''
            self._check_running()
            if not hasattr(iterable, '__len__'):
                iterable = list(iterable)
    
            if chunksize is None:
                chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
                if extra:
                    chunksize += 1
            if len(iterable) == 0:
                chunksize = 0
    
            task_batches = Pool._get_tasks(func, iterable, chunksize)
    

    Here we have len(iterable) == 100, len(self._pool) * 4 == 8, so chunksize, extra = 12, 4 which leads to chunksize = 13, hence the output shows the tasks being split into batches of 13.