Search code examples
pythonmultiprocessingpython-multiprocessing

Python Parallel Processing Syntax for Functions with Multiple Arguments


Hi I'm currently trying to run a function parallely on multiple cores due to the long run time of the program. I could not find the syntax for multiprocessing syntax for functions with multiple arguments. I have attached my code below and have no idea how to fix the syntax.

import pandas as pd
import numpy as np
import numpy.random as rdm
import matplotlib.pyplot as plt
import math as m
import random as r
import time
from joblib import Parallel, delayed

def firstLoop(r1, r2, d):
    count = 0
    for i in range(r1):
        for j in range(r2):
            if(findDistance(dat1[i, 0], dat2[j, 0], dat1[i, 1], dat2[j, 1]) <= d):
                count = count + 1
    return count

food1 = range(r1)
atm1 = range(r2)
d = 100
num_cores = multiprocessing.cpu_count()
results = Parallel(n_jobs=num_cores)(delayed(firstLoop)(for i in food1, for j in atm1, d))

I'm trying to run firstLoop on multiple cores with all the elements in food1 and atm1, but am unsure on the syntax for the program.

EDIT :

startTime = time.time()
with mp.Pool(processes = mp.cpu_count()) as p:
    p.starmap(f, [(x, y, d) for x in range(r1) for y in range(r2) for d in range(100, 200, 100)])
print(time.time()- startTime)

Solution

  • An easy syntax to do multiprocessing with function having severals parameters is using Pool() and starmap()

    import multiprocessing as mp
    
    def f(x, y):
        return x+y
    
    with mp.Pool(processes = mp.cpu_count()) as p:
        p.starmap(f, [(x, y) for x in range(10) for y in range(10)])
    

    To do so, the function f must perform an elementary step.

    In your code:

    def firstLoop(r1, r2, d):
        count = 0
        for i in range(r1):
            for j in range(r2):
                if(findDistance(dat1[i, 0], dat2[j, 0], dat1[i, 1], dat2[j, 1]) <= d):
                    count = count + 1
        return count
    

    The elementary step is findDistance(dat1[i, 0], dat2[j, 0], dat1[i, 1], dat2[j, 1]) <= d). A possible function f could be:

    def f(i, j, d):
        if findDistance(dat1[i, 0], dat2[j, 0], dat1[i, 1], dat2[j, 1]) <= d):
            return 1
        else:
            return 0
    

    Then we initialize the result array and the variables:

    r1 = 
    r2 = 
    d = 100
    results = np.zeros(shape=(1,r1*r2))
    
    # Compute
    with mp.Pool() as p:
        results = p.starmap(f, [(i, j, d) for i in range(r1) for j in range(r2)])
    

    Then to count you simply need to sum the array. I didn't test the code, the idea is here, but some adaptation might be needed especially in the way to return the result from the map function and to store them. I'm not using this syntax often since I'm mainly storing the results in a file at the end of the function multiprocessing.

    Maybe the creation of results is not actually needed beforehand. Not sure.