Search code examples
pythonmultithreadingpython-3.xpython-multiprocessing

Mutliprocessing in Python in a for loop and passing multiple Arguments


I'm doing a lot of calculations with a python script. As it is CPU-bound my usual approach with the threading module didn't yield any performance improvements.

I was now trying to use Multiprocessing instead of Multithreading to better use my CPU and speed up the lengthy calculations.

I found some example codes here on stackoverflow, but I don't get the script to accept more than one argument. Could somebody help me out with this? I've never used these modules before an I'm pretty sure I'm using Pool.map wrong. - Any help is appreciated. Other ways to accomplish Multiprocessing are also welcome.

from multiprocessing import Pool

def calculation(foo, bar, foobar, baz):
    # Do a lot of calculations based on the variables
    # Later the result is written to a file.
    result = foo * bar * foobar * baz
    print(result)

if __name__ == '__main__':
    for foo in range(3):
        for bar in range(5):
            for baz in range(4):
                for foobar in range(10):

                    Pool.map(calculation, foo, bar, foobar, baz)
                    Pool.close()
                    Pool.join()

Solution

  • You are, as you suspected, using map wrong, in more ways than one.

    • The point of map is to call a function on all elements of an iterable. Just like the builtin map function, but in parallel. If you want queue a single call, just use apply_async.

    • For the problem you were specifically asking about: map takes a single-argument function. If you want to pass multiple arguments, you can modify or wrap your function to take a single tuple instead of multiple arguments (I'll show this at the end), or just use starmap. Or, if you want to use apply_async, it takes a function of multiple arguments, but you pass apply_async an argument tuple, not separate arguments.

    • You need to call map on a Pool instance, not the Pool class. What you're trying to do is akin to try to read from the file type instead of reading from a particular open file.
    • You're trying to close and join the Pool after every iteration. You don't want to do that until you've finished all of them, or your code will just wait for the first one to finish, and then raise an exception for the second one.

    So, the smallest change that would work is:

    if __name__ == '__main__':
        pool = Pool()
        for foo in range(3):
            for bar in range(5):
                for baz in range(4):
                    for foobar in range(10):
                        pool.apply_async(calculation, (foo, bar, foobar, baz))
        pool.close()
        pool.join()
    

    Notice that I kept everything inside the if __name__ == '__main__': block—including the new Pool() constructor. I won't show this in the later examples, but it's necessary for all of them, for reasons explained in the Programming guidelines section of the docs.1


    If you instead want to use one of the map functions, you need an iterable full of arguments, like this:

    pool = Pool()
    args = ((foo, bar, foobar, baz) 
            for foo in range(3) 
            for bar in range(5) 
            for baz in range(4) 
            for foobar in range(10))
    pool.starmap(calculation, args)
    pool.close()
    pool.join()
    

    Or, more simply:

    pool = Pool()
    pool.starmap(calculate, itertools.product(range(3), range(5), range(4), range(10)))
    pool.close()
    pool.join()
    

    Assuming you're not using an old version of Python, you can simplify it even further by using the Pool in a with statement:

    with Pool() as pool:
        pool.starmap(calculate, 
                     itertools.product(range(3), range(5), range(4), range(10)))
    

    One problem with using map or starmap is that it does extra work to make sure you get the results back in order. But you're just returning None and ignoring it, so why do that work?

    Using apply_async doesn't have that problem.

    You can also replace map with imap_unordered, but there is no istarmap_unordered, so you'd need to wrap your function to not need starmap:

    def starcalculate(args):
        return calculate(*args)
    
    with Pool() as pool:
        pool.imap_unordered(starcalculate,
                            itertools.product(range(3), range(5), range(4), range(10)))
    

    1. If you're using the spawn or forkserver start methods—and spawn is the defaults on Windows—every child process does the equivalent of importing your module. So, all top-level code that isn't protected by a __main__ guard will get run in every child. The module tries to protect you from some of the worst consequences of this (e.g., instead of forkbombing your computer with an exponential explosion of children creating new children, you will often get an exception), but it can't make the code actually work.