I have implemented an OpenMP/MPI hybrid parallel algorithm, and would like to measure its strong-scaling parallel efficiency. For this, I would have to calculate speed-up S=t(1)/t(N)
, and then the efficiency E=S/N
.
Background: Having done some analysis, I was able to show that the peak efficiency of the algorithm could be expected at a problem size, at which the single node of my benchmark cluster cannot house the data required.
Possible solutions: I can either:
S=t(4)/t(N)
, or,t(1)
by extrapolation, and then use that value as reference.Questions:
Bonus question: When we measure t(1)
, should we run the algorithm with simulated communication calls (i.e. by calling mpirun -n 1 ./my_benchmark_program
), or should we rather call a version of the program which performs no communication at all (i.e. ./my_openmp_only_benchmark_program
)?
I hope this post is clear, please ask for clarification if it isn't. Any help will be greatly appreciated. Thanks in advance.
There are various problems with the classical definition of speedup if you are using MPI. The single processor case involves no communication, while the two-processor one does, so there is overhead in the t(2)
case and it will always be less than twice as fast. This is even worse if you have a multicore/multinode setup, where up to 16 (or so) processes will run on a single node, so t(17)
will suddenly be much slower because it starts involving a second node.
This means you can not simply apply the textbook formulas. You need to explain how you are doing your scalability study. For instance: one process per node until the number of processes == number of nodes, then start putting multple processes on each node, et cetera.
The fact that the single-process case does not fit in memory is then a minor hiccup: you start with a base case of multiple processes, and document that fact, plus your reasoning for the base case that you actually used.