Search code examples
multithreadingmultiprocessingparameter-passingpython-multiprocessing

Python multiprocessing - "outer product" of arguments


I'm doing a numerical simulation with multiple input parameters. Some of these parameters are static, while some are arrays of numbers I want to run my function at. For example, I may want to simulate the following set of parameters

a = 1
b = np.arange(1, 11)
c = np.arange(20, 31)
d = 1

Which would mean running

simulate(a = 1, b = 1, c = 20, d = 1)
simulate(a = 1, b = 1, c = 21, d = 1)
...
simulate(a = 1, b = 1, c = 30, d = 1)
simulate(a = 1, b = 2, c = 20, d = 1)
...

i.e. 100 calls to simulate(). I want to use multithreading to speed this up. I tried using multiprocessing's pool.map(), but the requirement for the structure of the input parameters requires instantiating a length-100 array containing lists of a, b, c, and d, e.g. [[1, 1, 20, 1], [1, 1, 21, 1] ...

In actuality I have enough parameters and am varying in enough dimensions that I run out of memory trying to generate the input array for pool.map().

What I'd like is to have a function map_wrapper() such that

a = 1
b = np.arange(1, 11)
c = np.arange(20, 31)
d = 1
map_wrapper(simulation, [a, b, c, d])

Is equivalent to the 100 calls of simulation() listed above, or a way of using map() or similar in the same way.


Solution

  • Here's a minimal example showing how to build a generator that yields args for a multiprocessing function:

    import multiprocessing as mp
    
    def gen_args():
        for a in range(4):
            for b in range(4,8):
                for c in range(8,12):
                    yield (a,b,c)
                    
    def foo(a, b, c):
        return a + b + c
    
    if __name__ == '__main__':
        with mp.Pool() as p:
            res = p.starmap(foo, gen_args())
    

    This will generate 64 tasks, but the arguments are calculated on the fly as needed rather than all at once at the beginning. Keep in mind you'll still need the space for the output list. Using the chunksize argument for starmap may or may not improve execution speed (test it both ways to find out), but it will increase memory usage somewhat, as it pulls multiple sets of args at a time for each worker function to work on.