Search code examples
calgorithmsortingparallel-processingopenmp

Parallel Bubble Sort is blocked


My OpenMP program blocks on the first "for" loop of the following code, without any apparent reason. I'm simply trying to parallelize a Bubble Sort.

Below is a complete code reproducing the issue :

#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
#include <omp.h>

static int N_THREADS;
#define CHUNK_SIZE (size/N_THREADS)

void
parallel_bubble_sort(uint64_t *T, const uint64_t size)
{
    register bool swapped;
    register uint64_t swap;
    register int i,j;

    #pragma omp parallel private(swap,i,j)
    do {
        swapped = false;

        #pragma omp for schedule(static) reduction(||:swapped)
        for (j=0; j<N_THREADS; j++)
        for (i=j*CHUNK_SIZE+1; i<=(j+1)*CHUNK_SIZE-1; i++)
        if (T[i-1] > T[i]) {
            swap = T[i-1];
            T[i-1] = T[i];
            T[i] = swap;
            swapped = true;
        }

        #pragma omp for schedule(static) reduction(||:swapped)
        for (i=CHUNK_SIZE-1; i<size-CHUNK_SIZE; i+=CHUNK_SIZE)
        if (T[i] > T[i+1]) {
            swap = T[i];
            T[i] = T[i+1];
            T[i+1] = swap;
            swapped = true;
        }
    } while(swapped);
}

int main ()
{
    uint64_t i;
    uint64_t N = 1024;
    N_THREADS = omp_get_max_threads();

    uint64_t *X = (uint64_t *) malloc(N * sizeof(uint64_t));
    for (i = 0 ; i < N ; i++) X[i] = N-i;

    parallel_bubble_sort(X, N);
    free(X);
}

Some additional context:

  • T* is a pointer to an array of type uint64_t
  • size is the size of the array
  • CHUNK_SIZE is simply size/NUM_THREADS (which is also the default chunk size value OpenMP uses in the static scheduling mode, so I should get the same behavior if I remove this from the clause)

Regarding the logic behind the code:

  • In the first loop, I divide my array into chunks, and I propagate bubbles separately without overlap between threads
  • In the second loop, I make sure the bubbles propagate at the borders

More details about the issue I have while executing:

  • My program is stuck on the first "for" loop. I have localized where the program blocks using #pragma omp single, and a simple print statement.

Solution

  • The cause of the deadlock is a data race condition in your outermost loop:

    do {
       swapped = false;  // <--- here
       ...
    } while(swapped);    // <--- here
    

    The race happens becase there is no guarantee that all threads will arrive at the instruction implementing the while(swapped) conditional at the same time. Imagine you have two threads. Thread 0 finishes the second parallel loop, sees that swapped is true, passes through the loop conditional and then starts again the loop body by setting swapped to false. If thread 1 reaches the the conditional before thread 0 was able to set swapped to false, it too will start a new iteration. But if arrives a bit too late, swapped will be false and the loop will terminate. As a result, thread 1 will not join the parallel loop and thread 0 will forever wait at the implicit synchronisation barrier.

    The solution is to make sure that all threads have a consistent view of what the value of swapped is when they make a decision whether to start a new iteration or not. The simplest solution is to insert a barrier right before setting swapped to false:

    do {
       #pragma omp barrier
       swapped = false;
       ...
    } while(swapped);
    

    Also, having all threads reset swapped is not really necessary and it may possibly (not sure about it) go against the OpenMP spec that forbids concurrent access to the original variable before the reduction is complete. I'm not sure if it applies to modifications before the reduction region (as I wasn't sure a couple of years ago) and there was a paragraph deleted from the OpenMP 4.5 spec regarding concurrent access, but just to be safe, I'd give the assingment the single treatment:

    do {
       #pragma omp barrier
       #pragma omp single
       swapped = false;
       ...
    } while(swapped);