performance parallel-processing mpi distributed distributed-computing

Upper bound on speedup

My MPI experience showed that the speedup as does not increase linearly with the number of nodes we use (because of the costs of communication). My experience is similar to this:.

Today a speaker said: "Magically (smiles), in some occasions we can get more speedup than the ideal one!".

He meant that ideally, when we use 4 nodes, we would get a speedup of 4. But in some occasions we can get a speedup greater than 4, with 4 nodes! The topic was related to MPI.

Is this true? If so, can anyone provide a simple example on that? Or maybe he was thinking about adding multithreading to the application (he went out of time and then had to leave ASAP, thus we could not discuss)?

Solution

Parallel efficiency (speed-up / number of parallel execution units) over unity is not at all uncommon.

The main reason for that is the total cache size available to the parallel program. With more CPUs (or cores), one has access to more cache memory. At some point, a large portion of the data fits inside the cache and this speeds up the computation considerably. Another way to look at it is that the more CPUs/cores you use, the smaller the portion of the data each one gets, until that portion could actually fit inside the cache of the individual CPU. This is sooner or later cancelled by the communication overhead though.

Also, your data shows the speed-up compared to the execution on a single node. Using OpenMP could remove some of the overhead when using MPI for intranode data exchange and therefore result in better speed-up compared to the pure MPI code.

The problem comes from the incorrectly used term ideal speed-up. Ideally, one would account for cache effects. I would rather use linear instead.