I understand my question title is rather broad, I am new to parallel programming and openmp. I tried to parallelize a C++ solution for the N-body problem and study it for different schedule types and granularity. I collected data by running a program for different cases and plotted the data, this is what I got (Performance vs Number of threads) (Performance can be assumed to be proportional to MegaFLOPS.)
Performance vs Number of Threads
I was surprised to see that static scheduling generally did better than dynamic scheduling for this problem? Can anyone explain the possible reasons for this behavior?
Your results are not that revelant to notice a strong difference between the dynamic and static approach scheduling. I find measuring speedup more appropriate in your context where you want to see the behaviour of your parallel scalability. You can also use different metrics such as weak and strong scaling.
You hardly reach a speedup of 2 using both scheduling with the coarse grained approach. This is not enough to conclude anything. Moreover, you cannot analyze your results from your fine grained implementation since you have no parallel gain from it (this can be explained by the poor workload you have for each thread). Get good parallel scalability first.
Generally I choose the static or dynamic scheduling depending on the type of computations I am working on :
Static scheduling where computation workload is regular (the same for each thread) such as basic image convolution, naive matrix computation. For instance, using static scheduling for gaussian filter should be the best option.
Dynamic scheduling where the computation workload is irregular such as Mandelbrot set. The way dynamic works is a little more complex (chunks are not precomputed as in static scheduling) hence some overhead might appear.
In your case, your nbody simulation implies quite regular works. So static scheduling should be more appropriate. Having good parallel scalability is sometimes empirical and depends of your context.
I recommend that in the first place, you let OpenMP choose the best scheduling and chunk size for you, then try to play with things if needed.