I just want to evaluate an integration of function by summarization by OpenMP by using an array to hold every values computed in every step > take sum of all values; and take sum without the array.
The code is:
double f(double x)
{
return sin(x)*sin(x)/(x*x+1);
}
METHOD 1
long i = 0;
const long NUM_STEP = 100000;
double sum[NUM_STEP];
double from = 0.0, to = 1.0;
double step = (to - from)/NUM_STEP;
double result = 0;
#pragma omp parallel for shared(sum) num_threads(4)
for(i=0; i<NUM_STEP; i++)
sum[i] = step*f(from+i*step);
for(i=0; i<NUM_STEP; i++)
result += sum[i];
printf("%lf", result);
METHOD 2
long i = 0;
const long NUM_STEP = 100000;
double from = 0.0, to = 1.0;
double step = (to - from)/NUM_STEP;
double result = 0;
#pragma omp parallel for shared(result) num_threads(4)
for(i=0; i<NUM_STEP; i++)
result += step*f(from+i*step);
printf("%lf", result);
But the results are too different. The METHOD 1 gave an stable value, but the METHOD 2 gave a changable value. Here is an example:
METHOD 1: 0.178446
METHOD 2: 0.158738
The value of METHOD 1 is true (checked by another tool).
TL;DR The first method does not have a race-condition whereas the second method has.
The first method does not have a race-condition whereas the second method has. Namely, in the first method:
#pragma omp parallel for shared(sum) num_threads(4)
for(i=0; i<NUM_STEP; i++)
sum[i] = step*f(from+i*step);
for(i=0; i<NUM_STEP; i++)
result += sum[i];
each thread saves the result of the operation step*f(from+i*step);
in a different position of the array sum[i]
. And after that the master thread, sequentially reduces the values saved on the array sum
, namely:
for(i=0; i<NUM_STEP; i++)
result += sum[i];
Actually, there is a slight improvement that you can make on this version; instead of allocating the array sum
with the same size as the number of NUM_STEP
, you can just allocated it with the same size as the number of threads, and each thread would save in a position equals to its ID
, namely:
int total_threads = 4;
double sum[total_threads];
#pragma omp parallel num_threads(total_threads)
{
int thread_id = omp_get_thread_num();
for(i=0; i<NUM_STEP; i++)
sum[thread_id] += step*f(from+i*step);
for(i=0; i< total_threads; i++)
result += sum[i];
}
Nonetheless, the best approach will be to actually fix the second method.
On the second method there is a race-condition on the update of the variable result
:
#pragma omp parallel for shared(result) num_threads(4)
for(i=0; i<NUM_STEP; i++)
result += step*f(from+i*step);
because the result
variable is being updated concurrently by multiple threads in a non-thread safety manner.
To solve this race-condition you need to use the reduction clause:
#pragma omp parallel for reduction(+:result) num_threads(4)
for(i=0; i<NUM_STEP; i++)
result += step*f(from+i*step);