I have read that using #pragma omp critical upon one statement like that is inefficient, i do not know why?
double area, pi, x;
int i, n;
...
area = 0.0;
#pragma omp parallel for private(x)
for (i = 0; i < n; i++) {
x = (i+0.5)/n;
#pragma omp critical
area += 4.0/(1.0 + x*x);
}
pi = area / n;
A naive compiler/runtime would do at each iteration:
area += ...
An alternative would be not to use locks, but perform area += ...
with an atomic instruction.
In both cases, this is way less efficient that using a reduction clause, in which each thread runs without any synchronization, and the reduction (possibly tree-based) only happens at the end of the OpenMP region.