python parallel-processing multiprocessing python-multiprocessing process-pool

How to assign values to array from inside the worker_funtion of multiprocessing.Pool.map?

Basically what I want is to insert those 2's into ar, so that ar gets changed outside the worker_function.

import numpy as np
import multiprocessing as mp
from functools import partial


def worker_function(i=None, ar=None):
    val = 2
    ar[i] = val
    print(ar)


def main():
    ar = np.zeros(5)
    func_part = partial(worker_function, ar=ar)
    mp.Pool(1).map(func_part, range(2))
    print(ar)


if __name__ == '__main__':
    main()

The only thing I can achieve so far is changing the copy of ar inside worker_function but not outside the function:

[2. 0. 0. 0. 0.]
[0. 2. 0. 0. 0.]
[0. 0. 0. 0. 0.]

Solution

For performance you should use a shared-memory multiprocessing.Array here to avoid reconstructing and sending arrays across different processes again and again. The array will be the same in all processes, which isn't the case in your example where you send copies around. That's also the reason you don't see the changes made in the parent.

import multiprocessing as mp
import numpy as np


def worker_function(i):
    global arr
    val = 2
    arr[i] = val
    print(mp.current_process().name, arr[:])


def init_arr(arr):
    globals()['arr'] = arr


def main():
    # as long as we don't conditionally modify the same indices 
    # from multiple workers, we don't need the lock ...
    arr = mp.Array('i', np.zeros(5, dtype=int), lock=False)
    mp.Pool(2, initializer=init_arr, initargs=(arr,)).map(worker_function, range(5))
    print(mp.current_process().name, arr[:])


if __name__ == '__main__':
    main()

Output:

ForkPoolWorker-1 [2, 0, 0, 0, 0]
ForkPoolWorker-2 [2, 2, 0, 0, 0]
ForkPoolWorker-1 [2, 2, 2, 0, 0]
ForkPoolWorker-2 [2, 2, 2, 2, 0]
ForkPoolWorker-1 [2, 2, 2, 2, 2]
MainProcess [2, 2, 2, 2, 2]

Process finished with exit code 0