c parallel-processing max openmp reduction

OpenMP reduction, variable not private?

I have an array like this (0,0 is bottom left):

0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 1 0 1 0 1 0 0
0 0 1 1 1 1 1 1 1
1 0 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1

My goal is to get the index of the higher line who is not completely set to 0. For this I made the code below (which works fine):

max=0;
for (i=0 ; i<width ; ++i) {
  for (j=max ; j<height ; ++j) {
    if (array[i*height+j]!=0) {
      max=j;
    }
  }
}

For the second loop I initialize j to max, because the global maximum cannot be less than a local maximum. And this way I can reduce the number of tests.

The I tried to parallelize it with OpenMp. My code is now:

max=0;
#pragma omp parallel for  default(none)                 \
                          shared(spec, width, height)   \
                          collapse(2)                   \
                          reduction(max:max)
for (i=0 ; i<width ; ++i) {
  for (j=max ; j<height ; ++j) {
    if (array[i*height+j]!=0) {
      max=j;
    }
  }
}

Which leads to a segmentation fault. In order to make it works, I changed j=max to j=0. So the problem seems to come from the max variable.

I don't understand why, because with the reduction this variable should be private (or lastprivate) between each threads. So why does it make it crash ? And how can I use my "optimization" with OpenMP ?

Solution

First of all, the user High Performance Mark is right in his comment. You shouldn't be using collapse if your loop index values depend on the value of a calculation. In your example, "j" depends on "max", which will produce an incorrect result. However, this is not the cause of your segmentation fault.

I would suggest you to debug your example so that you can find the source of the crash; "max" is being initialized with a negative number by default, which causes "j" to also have said value. Thus, when trying to access array[i*height+(-2147483648)], you get a segmentation fault.

This happens because OpenMP specifies an initial value for each reduction operator. In the case of the max operator, you can find the following description in the specification of OpenMP 3.1:

max Least representable value in the reduction list item type

In our case, that means that each thread will have at the start of the parallel region a private copy of the max variable holding the value of the lowest number that can be stored as an int (usually -2147483648).

I've written a very rudimentary workaround for your example. I removed the collapse clause and I'm initializing the max variable manually at the start of the parallel region:

#pragma omp parallel default(none) private(j) shared(array, width, height) reduction(max:max)
{
        // Explicit initialization
        max = 0;

        #pragma omp for 
        for (i=0 ; i<width ; ++i) {
          for (j=max ; j<height ; ++j) {
            if (array[i*height+j]!=0) {
                max=j;
            }
          }
        }
}

As an extra remark, you shouldn't need to use max=j everytime. You could try to check when the first 0 is found and use the previous position.

Hope it helps