Search code examples
pythonnumpypython-multiprocessingjoblib

Python multiprocessing (joblib) best way for argument passing


I've noticed a huge delay when using multiprocessing (with joblib). Here is a simplified version of my code:

import numpy as np
from joblib import Parallel, delayed

class Matcher(object):
    def match_all(self, arr1, arr2):
        args = ((elem1, elem2) for elem1 in arr1 for elem2 in arr2)

        results = Parallel(n_jobs=-1)(delayed(_parallel_match)(self, e1, e2) for e1, e2 in args)
        # ...

    def match(self, i1, i2):
        return i1 == i2

def _parallel_match(m, i1, i2):
    return m.match(i1, i2)

matcher = Matcher()
matcher.match_all(np.ones(250), np.ones(250))

So if I run it like shown above, it takes about 30 secs to complete and use almost 200Mb. If I just change the parameter n_jobs in Parallel and set it to 1 it only takes 1.80 secs and barely use 50Mb...

I suppose it has to be something related to the way I pass the arguments, but haven't found a better way to do it...

I'm using Python 2.7.9


Solution

  • I have re-written the code without using joblib library and now it works like it is supposed to work, although not so "beautiful" code:

    import itertools
    import multiprocessing
    import numpy as np
    
    
    class Matcher(object):
        def match_all(self, a1, a2):
            args = ((elem1, elem2) for elem1 in a1 for elem2 in a2)
            args = zip(itertools.repeat(self), args)
    
            pool = multiprocessing.Pool()
            results = np.fromiter(pool.map(_parallel_match, args))
            # ...
    
        def match(self, i1, i2):
            return i1 == i2
    
    def _parallel_match(*args):
        return args[0][0].match(*args[0][1:][0])
    
    matcher = Matcher() 
    matcher.match_all(np.ones(250), np.ones(250))
    

    This version works like a charm, and takes only 0.58 secs to complete...

    So, why isn't it working at all with joblib? Can't really get to understand it, but I guess joblib is making copies of the whole array for every single process...