Search code examples
pythonparallel-processingmultiprocessing

Parallel workers initialized with individual objects in Python


I need a number of workers in Python that is initialized with an object. The main process will send commands+arguments to the workers that should then execute methods in their object and return the result in parallel.

I have tried with multiprocessing.pool, but when calling pool.map, it seems random which process is executing with which argument, even when the pool is initialized with N processes and chunksize is set to 1.

import multiprocessing


def init(a):
    global myA
    myA = a


def get_value(_):
    global myA
    return myA.value


class A():
    def __init__(self, value):
        self.value = value


if __name__ == '__main__':
    N = 4
    a_lst = [A(i) for i in range(N)]
    pool = multiprocessing.Pool(N)
    pool.map(init, a_lst, chunksize=1)
    print(pool.map(get_value, range(N), chunksize=1))

output

[3, 1, 3, 1]

Can I do it with multiprocessing.pool or how can I do it?


Solution

  • A solution that does not depend on either the pool size or chunksize value, i.e. where you do not care which pool process is assigned to the tasks being submitted with the multiprocessing.pool.Pool.map method, is to initialize each pool process with the entire a_list list by using the initializer and initargs arguments on the pool initializer.

    In the following code I have purposely made the size of a_lst and the pool size different and I am letting the map method compute a default chunksize value

    import multiprocessing
    
    def init_pool(*args):
        global a_lst
    
        a_lst = args[0]
    
    def get_value(i):
        return a_lst[i].value
    
    class A():
        def __init__(self, value):
            self.value = value
    
    if __name__ == '__main__':
        N = 10 # list is length 10
        POOL_SIZE = 4 # pool size is 4
    
        a_lst = [A(i) for i in range(N)]
        pool = multiprocessing.Pool(POOL_SIZE, initializer=init_pool, initargs=(a_lst,))
        print(pool.map(get_value, range(N)))
    

    Prints:

    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]