Search code examples
pythonpython-multiprocessing

How to use Values in a multiprocessing pool with Python


I want to be able to use the Values module from the multiprocessing library to be able to keep track of data. As far as I know, when it comes to multiprocessing in Python, each process has it's own copy, so I can't edit global variables. I want to be able to use Values to solve this issue. Does anyone know how I can pass Values data into a pooled function?

from multiprocessing import Pool, Value
import itertools

arr = [2,6,8,7,4,2,5,6,2,4,7,8,5,2,7,4,2,5,6,2,4,7,8,5,2,9,3,2,0,1,5,7,2,8,9,3,2,]

def hello(g, data):
    data.value += 1

if __name__ == '__main__':
    data = Value('i', 0)
    func = partial(hello, data)
    p = Pool(processes=1)
    p.map(hello,itertools.izip(arr,itertools.repeat(data)))

    print data.value

Here is the runtime error i'm getting:

RuntimeError: Synchronized objects should only be shared between processes through inheritance

Does anyone know what I'm doing wrong?


Solution

  • I don't know why, but there seems to be some issue using the Pool that you don't get if creating subprocesses manually. E.g. The following works:

    from multiprocessing import Process, Value
    
    arr = [1,2,3,4,5,6,7,8,9]
    
    
    def hello(data, g):
        with data.get_lock():
            data.value += 1
        print id(data), g, data.value
    
    if __name__ == '__main__':
        data = Value('i')
        print id(data)
    
        processes =  []
        for n in arr:
            p = Process(target=hello, args=(data, n))
            processes.append(p)
            p.start()
    
        for p in processes:
            p.join()
    
        print "sub process tasks completed"
        print data.value
    

    However, if you do basically the same think using Pool, then you get an error "RuntimeError: Synchronized objects should only be shared between processes through inheritance". I have seen that error when using a pool before, and never fully got to the bottom of it.

    An alternative to using Value that seems to work with Pool is to use a Manager to give you a 'shared' list:

    from multiprocessing import Pool, Manager
    from functools import partial
    
    
    arr = [1,2,3,4,5,6,7,8,9]
    
    
    def hello(data, g):
        data[0] += 1
    
    
    if __name__ == '__main__':
        m = Manager()
        data = m.list([0])
        hello_data = partial(hello, data)
        p = Pool(processes=5)
        p.map(hello_data, arr)
    
        print data[0]