I am tryint to write a C (gcc) function that will calculate the maximum of an array of doubles while running across multiple threads. I create an array of size omp_get_num_threads
, in which I store the local maxima of each thread before finally maximizing this small array. The code is (more or less) the following:
int i;
double *local_max;
double A[1e10]; //made up size
#pragma omp parallel
{
#pragma omp master
{
local_max=(double *)calloc(omp_get_num_threads(),sizeof(double));
}
#pragma omp flush //so that all threads point
//to the correct location of local_max
#pragma omp for
for(i=0;i<1e10;i++){
if(A[i]>local_max[omp_get_thread_num()])
local_max[omp_get_thread_num()]=A[i];
}
}
free(local_max);
This, however, leads to segfaults, and valgrind complains of the usage of uninitialized variables. Turns out, local_max is not actually updated throughout all threads before they enter the for
construct. I thought #pragma omp flush
should do that? If I replace it with #pragma omp barrier
, everything works fine.
Could someone explain to me what is going on?
You need to put a barrier to ensure memory allocation has been completed. Memory allocation is a time consuming operation and when your final for loop starts running, local_max is not pointing to a properly allocated space. I modified your code below to demonstrate the behavior.
int i;
double *local_max;
omp_set_num_threads(8);
#pragma omp parallel
{
#pragma omp master
{
for(int k = 0; k < 999999; k++) {} // Lazy man's sleep function
cout << "Master start allocating" << endl;
local_max=(double *)calloc(omp_get_num_threads(),sizeof(double));
cout << "Master finish allocating" << endl;
}
#pragma omp flush
#pragma omp for
for(i=0;i<10;i++){
cout << "for : " << omp_get_thread_num() << " i: " << i << endl;
}
}
free(local_max);
getchar();
return 0;