My OpenMP program blocks on the first "for" loop of the following code, without any apparent reason. I'm simply trying to parallelize a Bubble Sort.
Below is a complete code reproducing the issue :
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
#include <omp.h>
static int N_THREADS;
#define CHUNK_SIZE (size/N_THREADS)
void
parallel_bubble_sort(uint64_t *T, const uint64_t size)
{
register bool swapped;
register uint64_t swap;
register int i,j;
#pragma omp parallel private(swap,i,j)
do {
swapped = false;
#pragma omp for schedule(static) reduction(||:swapped)
for (j=0; j<N_THREADS; j++)
for (i=j*CHUNK_SIZE+1; i<=(j+1)*CHUNK_SIZE-1; i++)
if (T[i-1] > T[i]) {
swap = T[i-1];
T[i-1] = T[i];
T[i] = swap;
swapped = true;
}
#pragma omp for schedule(static) reduction(||:swapped)
for (i=CHUNK_SIZE-1; i<size-CHUNK_SIZE; i+=CHUNK_SIZE)
if (T[i] > T[i+1]) {
swap = T[i];
T[i] = T[i+1];
T[i+1] = swap;
swapped = true;
}
} while(swapped);
}
int main ()
{
uint64_t i;
uint64_t N = 1024;
N_THREADS = omp_get_max_threads();
uint64_t *X = (uint64_t *) malloc(N * sizeof(uint64_t));
for (i = 0 ; i < N ; i++) X[i] = N-i;
parallel_bubble_sort(X, N);
free(X);
}
Some additional context:
Regarding the logic behind the code:
More details about the issue I have while executing:
The cause of the deadlock is a data race condition in your outermost loop:
do {
swapped = false; // <--- here
...
} while(swapped); // <--- here
The race happens becase there is no guarantee that all threads will arrive at the instruction implementing the while(swapped)
conditional at the same time. Imagine you have two threads. Thread 0 finishes the second parallel loop, sees that swapped
is true
, passes through the loop conditional and then starts again the loop body by setting swapped
to false
. If thread 1 reaches the the conditional before thread 0 was able to set swapped
to false
, it too will start a new iteration. But if arrives a bit too late, swapped
will be false
and the loop will terminate. As a result, thread 1 will not join the parallel loop and thread 0 will forever wait at the implicit synchronisation barrier.
The solution is to make sure that all threads have a consistent view of what the value of swapped
is when they make a decision whether to start a new iteration or not. The simplest solution is to insert a barrier right before setting swapped
to false
:
do {
#pragma omp barrier
swapped = false;
...
} while(swapped);
Also, having all threads reset swapped
is not really necessary and it may possibly (not sure about it) go against the OpenMP spec that forbids concurrent access to the original variable before the reduction is complete. I'm not sure if it applies to modifications before the reduction region (as I wasn't sure a couple of years ago) and there was a paragraph deleted from the OpenMP 4.5 spec regarding concurrent access, but just to be safe, I'd give the assingment the single
treatment:
do {
#pragma omp barrier
#pragma omp single
swapped = false;
...
} while(swapped);