parallel-processing mpi openmp multicore numa

Advantages of using MPI on a UMA machine

What are the advantages to using MPI on a UMA machine. It seems to me that is would make more sense to use OpenMP with a UMA machine because they both share memory. Where MPI makes more sense on a NUMA machine because NUMA gives each process its own memory.

Solution

The value in using a distributed-memory programming model like MPI or Charm++ even on nominally uniform shared-memory hardware is that it engenders a much more locality-conscious design of the algorithms and implementation. Even for a single core, memory access costs are non-uniform - assumptions of spatial and temporal locality are baked deeply into the design of common microprocessor memory hierarchies. Designing for distributed memory also means designing to operate on local chunks of data, rather than on the entire working set at once.

Also, keep in mind that even a single-socket multicore system still has private caches for each core, and that transferring data from one cache to another entails communication costs greater than those of access to private data in the local cache. For an example of how this can play out in applications, see Jetley & Kale, "Optimizations for Message Driven Applications on Multicore Architectures", published at HiPC 2011.