Search code examples
pythonmultiprocessing

Tracking the results of the multiprosssing version of the for-loop


I have a function that I apply sequentially on a list of objects and which returns me a score for each object as following :

def get_score(a):
    // do something
   return score

objects = [obj0, obj1, obj3]
results = np.zeros(len(objects))
index = 0
for i in range(len(results)):
    results[i]=get_score(objects[i])
   

I want to parallelize the execution of this function whith Multiprocessing library, but I have a question, how can I tell that such a score corresponds to such an object since I will not have a shared results list ?


Solution

  • One possible solution is to return the index and processed object (score) from the get_score function.

    Example:

    from multiprocessing import Pool
    
    
    def get_score(tpl):
        i, (par1, par2) = tpl
    
        # do something
        return i, f"{par1=} {par2=} processed"
    
    
    if __name__ == "__main__":
        par1 = ["obj1", "obj2", "obj3"]
        par2 = ["par2_1", "par2_2", "par2_3"]
        # ...
    
        results = [None] * len(par1)
    
        with Pool() as p:
            # process the objects in unordered fashion
            for i, result in p.imap_unordered(get_score, enumerate(zip(par1, par2))):
                results[i] = result
    
        print(results)
    

    Prints:

    [
        "par1='obj1' par2='par2_1' processed",
        "par1='obj2' par2='par2_2' processed",
        "par1='obj3' par2='par2_3' processed",
    ]