Search code examples
cparallel-processingmultiprocessingopenmp

Why can't reduction be used with OpenMP tasks?


This is the code for calculating a numerical approximation of the area of the Mandelbrot Set. I don't understand why can't I write reduction(+ : numoutside) instead of shared(numoutside) when declaring variables for the parallel region. When I do that, I get incorrect result. Can someone explain the problem with using reduction with OpenMP tasks, because it really confuses me why it doesn't work?

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>

#define NPOINTS 2000
#define MAXITER 2000

struct complex
{
    double real;
    double imag;
};

int main()
{
    int i, j, iter, numoutside = 0;
    double area, error, ztemp;
    struct complex z, c;

    double time1, time2, elapsed;

    /*
     *
     *
     *     Outer loops run over npoints, initialise z=c
     *
     *     Inner loop has the iteration z=z*z+c, and threshold test
     */

    time1 = omp_get_wtime();

#pragma omp parallel default(none) private(i, j, iter, c, z, ztemp) \
    shared(numoutside) // why not reduction(+ : numoutside) ???
    {
#pragma omp single
        {
            for (i = 0; i < NPOINTS; i++)
            {
#pragma omp task
                for (j = 0; j < NPOINTS; j++)
                {
                    // #pragma omp task - LOSE NE TREBA OVDE - NAPRAVICE SE PREVISE
                    // TASKOVA (NPOINTS * NPOINTS) I USPORICEMO SISTEM UMESTO DA UBRZAMO
                    {
                        c.real = -2.0 + 2.5 * (double)(i) / (double)(NPOINTS) + 1.0e-7;
                        c.imag = 1.125 * (double)(j) / (double)(NPOINTS) + 1.0e-7;
                        z = c;
                        for (iter = 0; iter < MAXITER; iter++)
                        {
                            ztemp = (z.real * z.real) - (z.imag * z.imag) + c.real;
                            z.imag = z.real * z.imag * 2 + c.imag;
                            z.real = ztemp;
                            if ((z.real * z.real + z.imag * z.imag) > 4.0e0)
                            {
#pragma omp atomic
                                numoutside++;
                                break;
                            }
                        }
                    }
                }
            }
        }
    }

    time2 = omp_get_wtime();
    elapsed = time2 - time1;
    printf("elapsed %f\n", elapsed);

    /*
     *  Calculate area and error and output the results
     */

    area = 2.0 * 2.5 * 1.125 * (double)(NPOINTS * NPOINTS - numoutside) / (double)(NPOINTS * NPOINTS);
    error = area / (double)NPOINTS;

    printf("Area of Mandlebrot set = %12.8f +/- %12.8f\n", area, error);
}

Solution

  • OpenMP task rules are a bit tricky. A variable that is shared in the enclosing OpenMP construct is by default still shared in the task, otherwise it is firstprivate.

    What happens:

    • numoutside is declared in a reduction clause, so the compiler create private instances of numoutside for each thread.
    • Inside the single directive, only one of these instances is used.
    • Because numoutside was not shared, it becomes firstprivate in each tasks: the compiler creates a fully new instance for each task. Whatever you doing on this instance within the task, it is lost once the task ends.
    • At the end you get zero.

    Note that declaring numoutside as shared will work only by chance if you don't protect the numoutside++ instruction with an atomic directive.

    I don't know how to perform reductions with the task contructs (I've seen that there exists a task_reduction clause in the OmpenMP standard, but I don't know how it works).

    But actually, I wonder why you want to use tasks instead of a simple parallel loop...