Im trying to construct a dataframe from the inputs of a function as well as the output. Previously I was using for loops
for i in range(x):
for j in range(y):
k = func(i, j)
(Place i, j, k into dataframe)
However the range was quite big so I tried to speed it up with multiprocessing.Pool()
with mp.Pool() as pool:
result = pool.starmap(func, ((i, j) for j in range(y) for i in range(x))
(Place result into dataframe)
However with pool I no longer have access to i and j as they are merely inputs into the function
I tried to get the function to return the inputs but that doesn't really make sense as the number of for loops increases, hence how to get the iterables passed into starmap?
Your starmap
version and normal version are not equivalent. When using multiple loops in a generator expression, the outer loop comes first. So the call should rather be:
result = pool.starmap(func, ((i, j) for i in range(x) for j in range(y)))
Coming back to the question, like I mentioned in the comments, starmap returns the task results in the same order they were submitted. So considering that the only thing you unwanted to parallelize were the func
calls, you can simply append all the results in one list, chunk it based on the value of y
(the number of columns), and run another set of for
loops outside the pool to get the value of i
, j
, and return value of func
at the same time. Example:
import multiprocessing as mp
def func(i, j):
return f"{i}{j}"
# https://stackoverflow.com/a/17483656/16310741
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]
if __name__ == "__main__":
x = 3
y = 4
with mp.Pool() as pool:
# ['00', '01', '02', '03', '10', '11', '12', '13', '20', '21', '22', '23']
result = pool.starmap(func, ((i, j) for i in range(x) for j in range(y)))
# [['00', '01', '02', '03'], ['10', '11', '12', '13'], ['20', '21', '22', '23']]
result = chunks(result, y)
for i in range(x):
for j in range(y):
curr_result = result[i][j]
print(i, j, curr_result)
# Do something with i, j, and curr_result