I'm doing a lot of calculations with a python script. As it is CPU-bound my usual approach with the threading module didn't yield any performance improvements.
I was now trying to use Multiprocessing instead of Multithreading to better use my CPU and speed up the lengthy calculations.
I found some example codes here on stackoverflow, but I don't get the script to accept more than one argument. Could somebody help me out with this? I've never used these modules before an I'm pretty sure I'm using Pool.map wrong. - Any help is appreciated. Other ways to accomplish Multiprocessing are also welcome.
from multiprocessing import Pool
def calculation(foo, bar, foobar, baz):
# Do a lot of calculations based on the variables
# Later the result is written to a file.
result = foo * bar * foobar * baz
print(result)
if __name__ == '__main__':
for foo in range(3):
for bar in range(5):
for baz in range(4):
for foobar in range(10):
Pool.map(calculation, foo, bar, foobar, baz)
Pool.close()
Pool.join()
You are, as you suspected, using map
wrong, in more ways than one.
The point of map
is to call a function on all elements of an iterable. Just like the builtin map
function, but in parallel. If you want queue a single call, just use apply_async
.
For the problem you were specifically asking about: map
takes a single-argument function. If you want to pass multiple arguments, you can modify or wrap your function to take a single tuple instead of multiple arguments (I'll show this at the end), or just use starmap
. Or, if you want to use apply_async
, it takes a function of multiple arguments, but you pass apply_async
an argument tuple, not separate arguments.
map
on a Pool
instance, not the Pool
class. What you're trying to do is akin to try to read
from the file type instead of reading from a particular open file.Pool
after every iteration. You don't want to do that until you've finished all of them, or your code will just wait for the first one to finish, and then raise an exception for the second one.So, the smallest change that would work is:
if __name__ == '__main__':
pool = Pool()
for foo in range(3):
for bar in range(5):
for baz in range(4):
for foobar in range(10):
pool.apply_async(calculation, (foo, bar, foobar, baz))
pool.close()
pool.join()
Notice that I kept everything inside the if __name__ == '__main__':
block—including the new Pool()
constructor. I won't show this in the later examples, but it's necessary for all of them, for reasons explained in the Programming guidelines section of the docs.1
If you instead want to use one of the map
functions, you need an iterable full of arguments, like this:
pool = Pool()
args = ((foo, bar, foobar, baz)
for foo in range(3)
for bar in range(5)
for baz in range(4)
for foobar in range(10))
pool.starmap(calculation, args)
pool.close()
pool.join()
Or, more simply:
pool = Pool()
pool.starmap(calculate, itertools.product(range(3), range(5), range(4), range(10)))
pool.close()
pool.join()
Assuming you're not using an old version of Python, you can simplify it even further by using the Pool
in a with
statement:
with Pool() as pool:
pool.starmap(calculate,
itertools.product(range(3), range(5), range(4), range(10)))
One problem with using map
or starmap
is that it does extra work to make sure you get the results back in order. But you're just returning None
and ignoring it, so why do that work?
Using apply_async
doesn't have that problem.
You can also replace map
with imap_unordered
, but there is no istarmap_unordered
, so you'd need to wrap your function to not need starmap
:
def starcalculate(args):
return calculate(*args)
with Pool() as pool:
pool.imap_unordered(starcalculate,
itertools.product(range(3), range(5), range(4), range(10)))
1. If you're using the spawn
or forkserver
start methods—and spawn
is the defaults on Windows—every child process does the equivalent of import
ing your module. So, all top-level code that isn't protected by a __main__
guard will get run in every child. The module tries to protect you from some of the worst consequences of this (e.g., instead of forkbombing your computer with an exponential explosion of children creating new children, you will often get an exception), but it can't make the code actually work.