Search code examples
cudampicuda-streams

Using __constant__ memory with MPI and streams


If I have a __constant__ value

__constant__ float constVal;

Which may or may not be initialized by MPI ranks on non-blocking streams:

cudaMemcpyToSymbolAsync((void*)&constVal,deviceValue,sizeof(float),0,cudaMemcpyDeviceToDevice,stream);

Is this:

  1. Safe to be accessed by multiple MPI ranks simultaneously within kernels? I.e. do ranks share the same instance of val or do MPI semantics (they all have a private copy) still hold?
  2. If the above is safe, is it safe to be initialized by multiple MPI ranks?

Solution

    1. Safe to be accessed by multiple MPI ranks simultaneously within kernels? I.e. do ranks share the same instance of val or do MPI semantics (they all have a private copy) still hold?

    Neither. CUDA contexts are not shared amongst processes. If you have multiple processes you get multiple contexts, and each context has its own copy of all the statically defined symbols and code. This behaviour is independent of MPI semantics. If you are imagining that multiple processes in an MPI communicator are sharing the same GPU context and state, they aren't.

    1. If the above is safe, is it safe to be initialized by multiple MPI ranks?

    It isn't only safe, it is mandatory.