The code below should use 4 threads to calculate the summation of all numbers between 0 and 1000. It should return 499500 but it is returning different values at each execution.
#include <omp.h>
#include <iostream>
using namespace std;
int main (int argc, char *argv[])
{
int nthreads, i, tid;
float total;
#pragma omp parallel num_threads(4)
{
tid = omp_get_thread_num();
if (tid == 0) {
nthreads = omp_get_num_threads();
cout << "Número de threads = " << nthreads <<endl;
}
#pragma omp barrier
total = 0.0;
#pragma omp for schedule(dynamic,10) private(i)
for (i=0; i<1000; i++)
total = total + i*1.0;
}
cout << "Total = " <<total << endl;
return 0;
}
What is happening in your code is that you have multiple threads concurrently modifying the value of the variable total
. To solve this issue, you can use OpenMP reduction
clause, which from the OpenMP standard one can read:
The reduction clause can be used to perform some forms of recurrence calculations (...) in parallel. For parallel and work-sharing constructs, a private copy of each list item is created, one for each implicit task, as if the private clause had been used. (...) The private copy is then initialized as specified above. At the end of the region for which the reduction clause was specified, the original list item is updated by combining its original value with the final value of each of the private copies, using the combiner of the specified reduction-identifier.
For a more detailed explanation on how the reduction clause works have a look a this SO Thread.
So to solve the race-condition in your code just changed it to:
#pragma omp for schedule(dynamic,10) private(i) reduction(+:total)
for (i=0; i<1000; i++)
total = total + i*1.0;
After applying the reduction clause, to avoid wrong results, do not initialize the shared variable total
within the parallel region:
total = 0.0;
simply set it before the parallel region.
There is another race condition regarding the variable tid itself, which is shared among threads and updated concurrently inside the parallel region:
tid = omp_get_thread_num();
This can be solve by making tid
private to each thread for instance:
int tid = omp_get_thread_num();
Side Notes
In OpenMP the outermost loop enclosed by the #pragma omp for
will have its index variable (i
in this context) already private, therefore the clause private(i)
is not really necessary.
Another point is the schedule(dynamic,10)
; unless you are just playing around, it actually makes more sense (performance-wise) to use the schedule static
, since the code does not lead to loading balancing issues. The dynamic
schedule has the additional overhead of assigning at run-time the tasks to threads, whereas the assignment in the static
schedule is performed at compile-time.
Finally, the following :
tid = omp_get_thread_num();
if (tid == 0) {
nthreads = omp_get_num_threads();
cout << "Número de threads = " << nthreads <<endl;
}
can be simplified by using the OpenMP master clause, namely:
#pragma omp master
{
nthreads = omp_get_num_threads();
cout << "Número de threads = " << nthreads <<endl;
}
A running solution with the changes applied:
#include <omp.h>
#include <iostream>
using namespace std;
int main (int argc, char *argv[])
{
float total = 0.0;
#pragma omp parallel num_threads(4)
{
#pragma omp master
{
int nthreads = omp_get_num_threads();
cout << "Número de threads = " << nthreads <<endl;
}
#pragma omp barrier
#pragma omp for reduction(+:total)
for (int i=0; i < 1000; i++)
total = total + i*1.0;
}
cout << "Total = " <<total << endl;
return 0;
}
Output:
Número de threads = 4
Total = 499500