Search code examples
cmultithreadingparallel-processingopenmp

Why in some cases #pragma omp critical directive is inefficient?


I have read that using #pragma omp critical upon one statement like that is inefficient, i do not know why?

double area, pi, x;
int i, n;
...
area = 0.0;
#pragma omp parallel for private(x)
for (i = 0; i < n; i++) {
   x = (i+0.5)/n;
#pragma omp critical
   area += 4.0/(1.0 + x*x);
}
pi = area / n;


Solution

  • A naive compiler/runtime would do at each iteration:

    • take a lock
    • compute `4.0 / (1.0 + x*x)
    • perform area += ...
    • release the lock

    An alternative would be not to use locks, but perform area += ... with an atomic instruction.

    In both cases, this is way less efficient that using a reduction clause, in which each thread runs without any synchronization, and the reduction (possibly tree-based) only happens at the end of the OpenMP region.