Is it efficient to calculate many results in parallel with multiprocessing.Pool.map()
in a situation where each input value is large (say 500 MB), but where input values general contain the same large object? I am afraid that the way multiprocessing
works is by sending a pickled version of each input value to each worker process in the pool. If no optimization is performed, this would mean sending a lot of data for each input value in map()
. Is this the case? I quickly had a look at the multiprocessing
code but did not find anything obvious.
More generally, what simple parallelization strategy would you recommend so as to do a map()
on say 10,000 values, each of them being a tuple (vector, very_large_matrix)
, where the vectors are always different, but where there are say only 5 different very large matrices?
PS: the big input matrices actually appear "progressively": 2,000 vectors are first sent along with the first matrix, then 2,000 vectors are sent with the second matrix, etc.
I think that the obvious solution is to send a reference to the very_large_matrix instead of a clone of the object itself? If there are only five big matrices, create them in the main process. Then when the multiprocessing.Pool
is instantiated it will create a number of child processes that clones the parent process' address space. That means that if there are six processes in the pool, there will be (1 + 6) * 5 different matrices in memory simultaneously.
So in the main process create a lookup of all unique matrices:
matrix_lookup = {1 : matrix(...), 2 : matrix(...), ...}
Then pass the index of each matrix in the matrix_lookup
along with the vectors to the worker processes:
p = Pool(6)
pool.map(func, [(vector, 1), (vector, 2), (vector, 1), ...])