i'm trying to parallelize this collapse loops with openMP, but this is what i got: "smooth.c:47:6: error: not enough perfectly nested loops before ‘sum’ sum = 0;"
Somebody knows a good way to parallelize this? i'm stuck 2 days in this problem.
Here my loops:
long long int sum;
#pragma omp parallel for collapse(3) default(none) shared(DY, DX) private(dx, dy) reduction(+:sum)
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) {
sum = 0;
for (d = 0; d < 9; d++) {
dx = x + DX[d];
dy = y + DY[d];
if (dx >= 0 && dx < width && dy >= 0 && dy < height)
sum += image(dy, dx);
}
smooth(y, x) = sum / 9;
}
}
Full code: https://github.com/fernandesbreno/smooth_
i'm trying to parallelize this collapse loops with openMP, but this is what i got: "smooth.c:47:6: error: not enough perfectly nested loops before ‘sum’ sum = 0;"
You cannot collapse three loop levels because the third level is not perfectly nested inside the second. There is
sum = 0;
before it and
smooth(y, x) = sum / 9;
after it in the middle loop. (I suppose smooth()
is a macro, else the assignment doesn't make sense. Don't do that, though, because it's confusing.)
Consider how you would rewrite that loop nest into an equivalent single loop by hand, using your knowledge of the problem structure and details. I submit that it would be challenging to do so, and that the result would furthermore have unavoidable data dependencies. But if you managed to do it without introducing dependencies, then voila! You have a single flat loop to parallelize, no collapsing needed.
Your simplest way forward, however, would probably be to collapse only two levels instead of three. Moreover, you want to compare with not collapsing at all, as it's not at all clear that collapsing will yield an improvement vs. parallelizing only the outer loop, and collapsing might even be worse.
But if you must have OpenMP collapse all three levels of the nest, then you need to take the two lines I called out above, and lift them out of the loop nest. Possibly you could do that in part by getting rid of sum
altogether and working directly with the result raster. Again, this is not necessarily going to produce an improvement.