Search code examples
pythonmultithreadingmultiprocessingjoblib

Issues when choosing between threading and multiprocessing


Have an issue with parallelising a code, using joblib.Parallel. When the backend is threading it works as intended as seen below, in terms of results. Meaning that both print show the intended result. When changing the backend to multiprocessing the code

  1. runs way faster
  2. the first print works as intended
  3. the second print (which is the final results) is None, completely ignoring what it printed

Here is a similar-MWE:

from joblib import Parallel, delayed
def E_th(i,tt,out_list):
    out_list[tt] =  tt+i
    print(out_list[tt])#>> prints correct results
    return 1


if __name__ == "__main__":
    time = range(0,10)
    for i in range(0,2):

        out_list = [None]*len(time)
        Parallel(n_jobs=64,backend='threading')(delayed(E_th)(i,tt,out_list) for tt in range(len(time)))

    print(out_list) #>> prints correct results

from joblib import Parallel, delayed
def E_th(i,tt,out_list):
    out_list[tt] =  tt+i
    print(out_list[tt])#>> prints correct results
    return 1


if __name__ == "__main__":
    time = range(0,10)
    for i in range(0,2):

        out_list = [None]*len(time)
        Parallel(n_jobs=64,backend='multiprocessing')(delayed(E_th)(i,tt,out_list) for tt in range(len(time)))

    print(out_list) #>> prints [None,None..]

I'm probably super bad at this so if there is a simple way to understand whats going on and I'll try to fix it :)


Solution

  • Multithreaded: The out_list is passed by reference to the child threads. So when they change it it changes in all the threads.

    Multiprocess: The out_list (in fact the whole memory footprint) is copied to the child processes. So when children update the list that change is not propagated up to the parent where the print happens.