Search code examples
pythonpython-3.xpython-multiprocessingpool

Why python uses 'CallByValue' function call when a dict (mutable dtype) is an argument to a function that is implemented via multiprocessing Pool?


Note: This is an contrived example to a bigger problem

from multiprocessing import Pool

dict1 = {'key1':1}

def alterDict(dict_num):
    for key in dict_num:
        dict_num[key] = 20000

alterDict(dict1)
print(dict1) # output is {'key1': 20000}



dict1 = {'key1':1}


with Pool(2) as p:
    p.map(alterDict,[dict1])

print(dict1) # output is {'key1': 1}

Why are the outputs different ? Is there a way to circumvent Pool from using a 'call by value' style of a function call ? I want to make pool use a call by reference style of a function call


Solution

  • when you are using multiprocessing and you want to change object like dict, list etc.. (shared data) you need to use Sharing state between process.

    import multiprocessing as mp
    
    def alterDict(dict_num): 
        for key, _ in dict_num.items(): 
            dict_num[key] = 20000 
    
    with mp.Manager() as manager: 
        d = manager.dict() 
        d['key'] = 1 
        with manager.Pool() as pool: 
            pool.map(alterDict, [d]) 
        print(dict(d)) 
    
    # {'key': 20000} # output
    

    BTW you should use dict_num.items() with items otherwise you will got error:

    /usr/local/lib/python3.8/multiprocessing/managers.py in _callmethod(self, methodname, args, kwds)
        848             dispatch(conn, None, 'decref', (token.id,))
        849             return proxy
    --> 850         raise convert_to_error(kind, result)
        851 
        852     def _getvalue(self):
    
    AttributeError: 'NoneType' object has no attribute '_registry'