numpy parallel-processing shared-memory ray

Simultaneous matrix vector multiplication with multiprocessing

What I want to do

I have an m x n numpy array A where m << n that I want to load on a node where all 20 CPUs on that node can share memory. On each CPU, I want to multiply A by a n x 1 vector v, where the vector v differs on each CPU but matrix A stays the same.

Constraint

Matrix A is sufficiently large so that I cannot load A on each CPU, so I would like to put A in shared node memory. And since A*v is just m x 1, I think I never need to store matrices of size m x n on each CPU (just one copy of A in shared memory).

The references below tell me I can use shared memory with the multiprocessing module.

My question

If I have 1 worker per CPU, can each worker simultaneously compute A x v (where v is different for each worker) using the multiprocessing module?

I am concerned that since I am simultaneously accessing the same shared memory by each worker, multiprocessing would copy matrix A into each CPU, which would cause memory problems for me.

Edit:

Question has been edited to remove the part about ray.io because I just learned Ray has its own forum so I am asking the same question except for Ray on that forum.

References

Solution

For posterity, here is the answer copy-pasted from discuss.ray.io:

The recommended way to do this would be to wrap these all in one Ray job. The setup is a bit different compared to something like MPI or multiprocessing; instead of you launching python workers explicitly, Ray will automatically start workers and execute tasks for you. Also, all of the tasks and shared-memory objects created during one “application” are associated with the “driver”, aka the process that executes the main python script, so once that script exits, all of the associated tasks and objects will be cleaned up as well.

The code from the other post is a good start:

import ray

@ray.remote
def multiply(A, v):
    return A * v  # Put your worker code here.

A_ref = ray.put(A)  # Put A in Ray's shared-memory object store.
refs = [multiply.remote(A_ref, v) for v in vs]
results = ray.get(refs)