Search code examples
pythonpython-2.7multiprocessingpool

passing arguments and manager.dict to pool in multiprocessing in python 2.7


I want to parallelise a function that will update a shared dictionary using Pool instead of Process so that i don't over-allocate too many cpus.

i.e. can i take this

def my_function(bar,results):
    results[bar] = bar*10

def paralell_XL():

    from multiprocessing import Pool, Manager, Process

    manager = Manager()
    results=manager.dict()

    jobs = []
    for bar in foo:
        p=Process(target=my_function, args=(bar, results))
        jobs.append(p)
        p.start()

    for proc in jobs:
        proc.join()

and change the paralell_XL() function to something like this ?

def paralell_XL():

    from multiprocessing import Pool, Manager, Process

    manager = Manager()
    results=manager.dict()

    p = Pool(processes=4)
    p.map(my_function,(foo,results))

trying the above gives the following error

TypeError: unsupported operand type(s) for //: 'int' and 'DictProxy'

thanks


Solution

  • so the problem is with passing many arguments to pool. As demonstrated here Python multiprocessing pool.map for multiple arguments you just need to make it into a tuple and add a wrapper. This works for passing a manager.dict as an argument also.

    def my_function(bar,results):
        results[bar] = bar*10
    
    def func_star(a_b):
        """Convert `f([1,2])` to `f(1,2)` call."""
        return my_function(*a_b)
    
    def paralell_XL():
    
        from multiprocessing import Pool, Manager, Process
        import itertools
    
        manager = Manager()
        results=manager.dict()
    
        pool = Pool(processes=4)    
        pool.map(func_star, itertools.izip(foo, itertools.repeat(results)))
    

    (note I think this question + answer is worth keeping as it wasn't fully clear to me that you would be able to pass the manager.dict into the function this way)