c multithreading parallel-processing openmp race-condition

Difference between two methods for OpenMP parallel

I just want to evaluate an integration of function by summarization by OpenMP by using an array to hold every values computed in every step > take sum of all values; and take sum without the array.

The code is:

double f(double x)
{
    return sin(x)*sin(x)/(x*x+1);
}

METHOD 1

    long i = 0;
    const long NUM_STEP = 100000;
    double sum[NUM_STEP];
    double from = 0.0, to = 1.0;
    double step = (to - from)/NUM_STEP;
    double result = 0;

    #pragma omp parallel for shared(sum) num_threads(4)
    for(i=0; i<NUM_STEP; i++)
        sum[i] = step*f(from+i*step);
    for(i=0; i<NUM_STEP; i++)
        result += sum[i];
    printf("%lf", result);

METHOD 2

    long i = 0;
    const long NUM_STEP = 100000;
    double from = 0.0, to = 1.0;
    double step = (to - from)/NUM_STEP;
    double result = 0;

    #pragma omp parallel for shared(result) num_threads(4)
    for(i=0; i<NUM_STEP; i++)
        result += step*f(from+i*step);
    printf("%lf", result);

But the results are too different. The METHOD 1 gave an stable value, but the METHOD 2 gave a changable value. Here is an example:

METHOD 1: 0.178446

METHOD 2: 0.158738

The value of METHOD 1 is true (checked by another tool).

Solution

TL;DR The first method does not have a race-condition whereas the second method has.

The first method does not have a race-condition whereas the second method has. Namely, in the first method:

#pragma omp parallel for shared(sum) num_threads(4)
for(i=0; i<NUM_STEP; i++)
    sum[i] = step*f(from+i*step);
for(i=0; i<NUM_STEP; i++)
    result += sum[i];

each thread saves the result of the operation step*f(from+i*step); in a different position of the array sum[i]. And after that the master thread, sequentially reduces the values saved on the array sum, namely:

for(i=0; i<NUM_STEP; i++)
    result += sum[i];

Actually, there is a slight improvement that you can make on this version; instead of allocating the array sum with the same size as the number of NUM_STEP, you can just allocated it with the same size as the number of threads, and each thread would save in a position equals to its ID, namely:

int total_threads = 4;
double sum[total_threads];
#pragma omp parallel num_threads(total_threads)
{
  int thread_id = omp_get_thread_num();
  for(i=0; i<NUM_STEP; i++)
      sum[thread_id] += step*f(from+i*step);
  for(i=0; i< total_threads; i++)
      result += sum[i];
}

Nonetheless, the best approach will be to actually fix the second method.

On the second method there is a race-condition on the update of the variable result:

#pragma omp parallel for shared(result) num_threads(4)
for(i=0; i<NUM_STEP; i++)
    result += step*f(from+i*step);

because the result variable is being updated concurrently by multiple threads in a non-thread safety manner.

To solve this race-condition you need to use the reduction clause:

#pragma omp parallel for reduction(+:result) num_threads(4)
for(i=0; i<NUM_STEP; i++)
    result += step*f(from+i*step);