I am confused about the data sharing scope of the variable acc in the flowing two cases. In the case 1 I get following compilation error: error: reduction variable ‘acc’ is private in outer context
, whereas the case 2 compiles without any issues.
According to this article variables defined outside parallel region are shared.
Why is adding for-loop parallelism privatizing acc? How can I in this case accumulate the result calculated in the the for-loop and distribute a loop's iteration space across a thread team?
case 1
float acc = 0.0f;
#pragma omp for simd reduction(+: acc)
for (int k = 0; k < MATRIX_SIZE; k++) {
float mul = alpha;
mul *= a[i * MATRIX_SIZE + k];
mul *= b[j * MATRIX_SIZE + k];
acc += mul;
}
case 2
float acc = 0.0f;
#pragma omp simd reduction(+: acc)
for (int k = 0; k < MATRIX_SIZE; k++) {
float mul = alpha;
mul *= a[i * MATRIX_SIZE + k];
mul *= b[j * MATRIX_SIZE + k];
acc += mul;
}
Your case 1 is violating OpenMP semantics, as there's an implicit parallel region (see OpenMP Language Terminology, "sequential part") that contains the definition of acc
. Thus, acc
is indeed private to that implicit parallel region. This is what the compiler complains about.
Your case 2 is different in that the simd
construct is not a worksharing construct and thus has a different definition of the semantics of the reduction
clause.
Case 1 would be correct if you wrote it this way:
void example(void) {
float acc = 0.0f;
#pragma omp parallel for simd reduction(+: acc)
for (int k = 0; k < MATRIX_SIZE; k++) {
float mul = alpha;
mul *= a[i * MATRIX_SIZE + k];
mul *= b[j * MATRIX_SIZE + k];
acc += mul;
}
}
The acc
variable is now defined outside of the parallel
that the for simd
construct binds to.