Search code examples
copenmpreduction

User defined reduction not returning expected result with each run


I'm trying to find, parallelly using OpenMP, the minimum and maximum values in a 2d array as well as the indexes of the minimum and maximum. In my attempt, I use user-defined reductions, however I get unexpected results for each run.

I've tried inspecting the values of min and max in the for loops and it seems that inside the parallelized for loops, the min and max values are as expected. However, at the end of the run, min and max contain completely wacko values.

My reduction definition

typedef struct {
    int value;
    int index_i;
    int index_j;
} Point;

#pragma omp declare reduction(minimum : Point : \
    omp_out = omp_in.value < omp_out.value ? omp_in : omp_out) \
    initializer(omp_priv = {INT_MAX, 0, 0})
#pragma omp declare reduction(maximum : Point : \
    omp_out = omp_in.value > omp_out.value ? omp_in : omp_out) \
    initializer(omp_priv = {0, 0, 0})

Initialization of the 2d array where size is 10000

for (int i = 0; i < size; i++) {
    for (int j = 0; j < size; j++) {
        matrix[i][j] = rand()%99;
    }
}

The parallelized loops:

int i, j, total=0;
Point min, max;

#pragma omp parallel for reduction (+:total) reduction(minimum : min) reduction(maximum : max) private(j)
    for (i = 0; i < size; i++) {
        for (j = 0; j < size; j++) {
            total += matrix[i][j];

            if (matrix[i][j] < min.value) {
                min.value = matrix[i][j];
                min.index_i = i;
                min.index_j = j;
            }


            if (matrix[i][j] > max.value) {
                max.value = matrix[i][j];
                max.index_i = i;
                max.index_j = j;
            }
        }
    }

The expected result is that min = 0 at index (0, 70) and max = 98 at index (0, 20).

The actual results are different each time but an example output:

The min is -290390323 at index (21850, -9176672)
The max is 32595 at index (0, 0)

Solution

  • Part of the idea of OpenMP is that it enables parallelization of existing, correct serial code. Generally speaking, removing or ignoring all the omp pragmas from correct OpenMP code -- so that it runs strictly serially -- should not change the computed result. Your code does not satisfy that requirement because you do not initialize the min and max accumulation variables.

    I guess you expected the initialization clause of your reduction definition to be applied to the shared variables, but you have misunderstood. The initialization clause is used to initialize the per-thread local copies, not the shared variables. The local copies are at some point combined with the shared variables as part of the reduction, otherwise the code would not produce the same result when run serially.

    Additionally, note that for C, OpenMP reduction initializer clauses in fact provide initializers, in the C standard's sense of the term. These are not the same thing as assignment statements, and the difference is especially clear in your case, where the list item has a structure type. Your initializers are fine as initializers, but they are not valid assignment expressions. Therefore, they cannot be used to assign initial values to the shared variables inside the parallel region, for initializers appear only as part of variable declarations.