Search code examples
pythonparallel-processingmultiprocessingpython-multiprocessingjoblib

Intermediate results from joblib


I'm trying to learn the joblib module as an alternative to the builtin multiprocessing module in python. I'm used to using multiprocessing.imap to run a function over an iterable and returning the results as they come in. In this minimal working example, I can't figure out how to do it with joblib:

import joblib, time

def hello(n):
    time.sleep(1)
    print "Inside function", n
    return n

with joblib.Parallel(n_jobs=1) as MP:

    func = joblib.delayed(hello)
    for x in MP(func(x) for x in range(3)):
        print "Outside function", x

Which prints:

Inside function 0
Inside function 1
Inside function 2
Outside function 0
Outside function 1
Outside function 2

I'd like to see the output:

Inside function 0
Outside function 0
Inside function 1
Outside function 1
Inside function 2
Outside function 2

Or something similar, indicating that the iterable MP(...) is not waiting for all the results to complete. For longer demo change n_jobs=-1 and range(100).


Solution

  • To get Immediate results from joblib, for instance:

    from joblib._parallel_backends import MultiprocessingBackend
    
    class ImmediateResult_Backend(MultiprocessingBackend):
        def callback(self, result):
            print("\tImmediateResult function %s" % (result))
    
        # Overload apply_async and set callback=self.callback
        def apply_async(self, func, callback=None):
            applyResult = super().apply_async(func, self.callback)
            return applyResult
    
    joblib.register_parallel_backend('custom', ImmediateResult_Backend, make_default=True)
    
    with joblib.Parallel(n_jobs=2) as parallel:
        func = parallel(delayed(hello)(y) for y in range(3))
        for f in func:
            print("Outside function %s" % (f))
    

    Output:
    Note: I use time.sleep(n * random.randrange(1,5)) in def hello(...), therefore processes become different ready.

    Inside function 0
    Inside function 1
    ImmediateResult function [0]
    Inside function 2
    ImmediateResult function [1]
    ImmediateResult function [2]
    Outside function 0
    Outside function 1
    Outside function 2

    Tested with Python:3.4.2 - joblib:0.11