Despite reading here, here, here, and so many other similar posts, I still cannot parallelize my problem. These are the for
loops that I have:
a = [1,11]
b = [2,22,222]
c = [3,33,333,3333]
results_01 = []
results_02 = []
results_03 = []
for i in range(len(a)):
for j in range(len(b)):
for k in range(len(c)):
r_01 = [a[i] + b[j] + c[k]]
r_02 = [a[i] - b[j] - c[k]]
r_03 = [a[i] * b[j] * c[k]]
results_01.append(r_01)
results_02.append(r_02)
results_03.append(r_03)
I need to parallelize this AND keep track of what combination of i
, j
, and k
is corresponding to each final answer (e.g. I need to know which final answers are corresponding to a[1]
, b[2]
, and c[3]
). I have tried various methods and none works, yet the one that sounds most logical to me is the following:
import multiprocessing as mp
from multiprocessing import Pool
num_processes = mp.cpu_count()-12
def parallelize(i,j,k):
r_01 = [i + j + k]
r_02 = [i - j - k]
r_03 = [i * j * k]
return r_01, r_02, r_03
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
a = [1,11]
b = [2,22,222]
c = [3,33,333,3333]
pool = Pool(processes = num_processes)
results = pool.map(parallelize(a[i],b[j],c[k]), [p for p in range(num_processes)])
pool.close()
pool.join()
results_01 = [i[0] for i in results]
results_02 = [i[1] for i in results]
results_03 = [i[2] for i in results]
This gives me the error name 'i' is not defined
, which makes complete sense, but since I am new to MP I have no idea how else I could possibly do this! Could anyone help me with this please?
P.S. This is a very simplified problem I have made up! In reality my problem is much more complex, but solving this can help me to solve my real problem.
Try this:
results = pool.starmap(parallelize, [(ai, bj, ck) for ai in a for bj in b for ck in c])
Some explanations:
pool.map
only works for functions with one argument. For functions with more argument, you can use pool.starmap
for convenience, which helps you "unpack" the arguments just like calling parallelize(*tuple)
.pool.map
or pool.starmap
, you need to pass the function itself as a parameter, rather than a single invocation of it --- the entire point is to have other threads do your work for you. This means no parentheses after the function name.num_processes
. Just pass it a list of all the tasks you want to do and let the pool do the rest. (Unless each individual task is too little work, in which case you may want to combine them to reduce the overhead.)