Search code examples

Simultaneous matrix vector multiplication with multiprocessing

What I want to do

I have an m x n numpy array A where m << n that I want to load on a node where all 20 CPUs on that node can share memory. On each CPU, I want to multiply A by a n x 1 vector v, where the vector v differs on each CPU but matrix A stays the same.


Matrix A is sufficiently large so that I cannot load A on each CPU, so I would like to put A in shared node memory. And since A*v is just m x 1, I think I never need to store matrices of size m x n on each CPU (just one copy of A in shared memory).

The references below tell me I can use shared memory with the multiprocessing module.

My question

If I have 1 worker per CPU, can each worker simultaneously compute A x v (where v is different for each worker) using the multiprocessing module?

I am concerned that since I am simultaneously accessing the same shared memory by each worker, multiprocessing would copy matrix A into each CPU, which would cause memory problems for me.


Question has been edited to remove the part about because I just learned Ray has its own forum so I am asking the same question except for Ray on that forum.


  1. Shared-memory objects in multiprocessing
  2. How to do parallel programming in Python?


  • For posterity, here is the answer copy-pasted from

    The recommended way to do this would be to wrap these all in one Ray job. The setup is a bit different compared to something like MPI or multiprocessing; instead of you launching python workers explicitly, Ray will automatically start workers and execute tasks for you. Also, all of the tasks and shared-memory objects created during one “application” are associated with the “driver”, aka the process that executes the main python script, so once that script exits, all of the associated tasks and objects will be cleaned up as well.

    The code from the other post is a good start:

    import ray
    def multiply(A, v):
        return A * v  # Put your worker code here.
    A_ref = ray.put(A)  # Put A in Ray's shared-memory object store.
    refs = [multiply.remote(A_ref, v) for v in vs]
    results = ray.get(refs)