Search code examples
pythondictionaryconcurrent.futures

Python concurrent.futures


I have a multiprocessing code, and each process have to analyse same data differently.

I have implemented:

with concurrent.futures.ProcessPoolExecutor() as executor:
   res = executor.map(goal_fcn, p, [global_DataFrame], [global_String])

for f in concurrent.futures.as_completed(res):
   fp = res

and function:

def goal_fcn(x, DataFrame, String):
   return heavy_calculation(x, DataFrame, String)

the problem is goal_fcn is called only once, while should be multiple time

In debugger, I checked now the variable p is looking, and it has multiple columns and rows. Inside goal_fcn, variable x have only first row - looks good.

But the function is called only once. There is no error, the code just execute next steps.

Even if I modify variable p = [1,3,4,5], and of course code. goal_fcn is executed only once

I have to use map() because keeping the order between input and output is required


Solution

  • map works like zip. It terminates once at least one input sequence is at its end. Your [global_DataFrame] and [global_String] lists have one element each, so that is where map ends.

    There are two ways around this:

    1. Use itertools.product. This is the equivalent of running "for all data frames, for all strings, for all p". Something like this:
    def goal_fcn(x_DataFrame_String):
        x, DataFrame, String = x_DataFrame_String
        ...
    
    executor.map(goal_fcn, itertools.product(p, [global_DataFrame], [global_String]))
    
    1. Bind the fixed arguments instead of abusing the sequence arguments.
    def goal_fcn(x, DataFrame, String):
        pass
    
    bound = functools.partial(goal_fcn, DataFrame=global_DataFrame, String=global_String)
    executor.map(bound, p)