Search code examples
pythonjupyter-notebookmultiprocessinganaconda3

Jupyter notebook issues with Multiprocessing Pool


I'm trying to apply Multiprocessing in my code and I ran into this example:

import multiprocessing
from itertools import product

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with multiprocessing.Pool(processes=3) as pool:
        results = pool.starmap(merge_names, product(names, repeat=2))
    print(results)

This should take no more than a few seconds, but when I ran it in Jupyter Notebook it does not end, I have to reset the kernel for that. Any special issues with Jupyter or Anaconda in using Multiprocessing?

I'm using

conda version 4.8.4
ipython version 5.8.0

Solution

  • This not really an answer but since comments cannot nicely format code, I'll put it here Your code does not work for me even in pure python 3.8 (installed through conda though) - I do not think it is connected to the jupyter or ipython.

    This code works for me:

    import multiprocessing
    from itertools import product
    
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with multiprocessing.Pool(processes=3) as pool:
        results = pool.starmap('{} & {}'.format, product(names, repeat=2))
    print(results)
    

    thus is seems that there is some issue with pickling the custom function and sending it to the pool - I do not know the cause nor the solution for that.

    But if you just need similar functionality, I recommend joblib

    from joblib import Parallel, delayed
    from itertools import product
    
    def merge_names(a, b):
        return '{} & {}'.format(a, b)
    
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    result = Parallel(n_jobs=3, prefer="processes")(delayed(merge_names)(a, b) for a,b in product(names, repeat=2))
    print(result)
    

    The joblib has similar construction of pool of workers, which then can be used similarly as you want:

    with Parallel(n_jobs=2) as parallel:
       ...