Search code examples
cmultithreadingparallel-processingopenmprace-condition

Difference between two methods for OpenMP parallel


I just want to evaluate an integration of function by summarization by OpenMP by using an array to hold every values computed in every step > take sum of all values; and take sum without the array.

The code is:

double f(double x)
{
    return sin(x)*sin(x)/(x*x+1);
}

METHOD 1

    long i = 0;
    const long NUM_STEP = 100000;
    double sum[NUM_STEP];
    double from = 0.0, to = 1.0;
    double step = (to - from)/NUM_STEP;
    double result = 0;

    #pragma omp parallel for shared(sum) num_threads(4)
    for(i=0; i<NUM_STEP; i++)
        sum[i] = step*f(from+i*step);
    for(i=0; i<NUM_STEP; i++)
        result += sum[i];
    printf("%lf", result);

METHOD 2

    long i = 0;
    const long NUM_STEP = 100000;
    double from = 0.0, to = 1.0;
    double step = (to - from)/NUM_STEP;
    double result = 0;

    #pragma omp parallel for shared(result) num_threads(4)
    for(i=0; i<NUM_STEP; i++)
        result += step*f(from+i*step);
    printf("%lf", result);

But the results are too different. The METHOD 1 gave an stable value, but the METHOD 2 gave a changable value. Here is an example:

METHOD 1: 0.178446

METHOD 2: 0.158738

The value of METHOD 1 is true (checked by another tool).


Solution

  • TL;DR The first method does not have a race-condition whereas the second method has.

    The first method does not have a race-condition whereas the second method has. Namely, in the first method:

    #pragma omp parallel for shared(sum) num_threads(4)
    for(i=0; i<NUM_STEP; i++)
        sum[i] = step*f(from+i*step);
    for(i=0; i<NUM_STEP; i++)
        result += sum[i];
    

    each thread saves the result of the operation step*f(from+i*step); in a different position of the array sum[i]. And after that the master thread, sequentially reduces the values saved on the array sum, namely:

    for(i=0; i<NUM_STEP; i++)
        result += sum[i];
    

    Actually, there is a slight improvement that you can make on this version; instead of allocating the array sum with the same size as the number of NUM_STEP, you can just allocated it with the same size as the number of threads, and each thread would save in a position equals to its ID, namely:

    int total_threads = 4;
    double sum[total_threads];
    #pragma omp parallel num_threads(total_threads)
    {
      int thread_id = omp_get_thread_num();
      for(i=0; i<NUM_STEP; i++)
          sum[thread_id] += step*f(from+i*step);
      for(i=0; i< total_threads; i++)
          result += sum[i];
    }
    

    Nonetheless, the best approach will be to actually fix the second method.

    On the second method there is a race-condition on the update of the variable result:

    #pragma omp parallel for shared(result) num_threads(4)
    for(i=0; i<NUM_STEP; i++)
        result += step*f(from+i*step);
    

    because the result variable is being updated concurrently by multiple threads in a non-thread safety manner.

    To solve this race-condition you need to use the reduction clause:

    #pragma omp parallel for reduction(+:result) num_threads(4)
    for(i=0; i<NUM_STEP; i++)
        result += step*f(from+i*step);