c++multithreading concurrency parallel-processing openmp

OpenMP. Parallelization of two consecutive cycles

I am studying OpenMP and have written an implementation of shaker sorting. There are 2 consecutive cycles here, and in order for them to be called sequentially, I added blockers in the form of omp_init_lock, omp_destroy_lock, but still the result is incorrect. Please tell me how you can parallelize two consecutive cycles. My code is below:

int Left, Right;
Left = 1;
Right = ARR_SIZE;

while (Left <= Right)
{
    omp_init_lock(&lock);
    #pragma omp parallel reduction(+:Left) num_threads(4)
    {
        #pragma omp for
        for (int i = Right; i >= Left; i--) {
            if (Arr[i - 1] > Arr[i]) {
                int temp;
                temp = Arr[i];
                Arr[i] = Arr[i - 1];
                Arr[i - 1] = temp;
            }
        }
        Left++;
    }
    omp_destroy_lock(&lock);

    omp_init_lock(&lock);
    #pragma omp parallel reduction(+:Right) num_threads(4)
    {
        #pragma omp for
        for (int i = Left; i <= Right; i++) {
            if (Arr[i - 1] > Arr[i]) {
                int temp;
                temp = Arr[i];
                Arr[i] = Arr[i - 1];
                Arr[i - 1] = temp;
            }
        }
        Right--;
    }
    omp_destroy_lock(&lock);

}

Solution

You seem to have several misconceptions about how OpenMP works.

Two parallel sections don't execute in parallel. This is fork-join parallelism. The parallel section itself is executed by multiple threads which then join back up at the end of the parallel section.

Your code looks like you expected them to work like pragma omp sections. Side note: Unless you have absolutely no other choice and/or you know exactly what you are doing, don't use sections. They don't scale well.

Your use of the lock API is wrong. omp_init_lock initializes a lock object. It doesn't acquire it. Likewise the destroy function deallocates it, it doesn't release the lock. If you ever want to acquire a lock, use omp_set_lock and omp_unset_lock on locks that you initialize once before you enter a parallel section.

Generally speaking, if you need a lock for an extended section of your code, it will not parallelize. Read up on Amdahl's law. Locks are only useful if used rarely or if the chance of two threads competing for the same lock at the same time is low.

Your code contains race conditions. Since you used pragma omp for, two different threads may execute the i'th and (i-1)'th iteration at the same time. That means they will touch the same integers. That's undefined behavior and will lead to them stepping on each other's toes, so to speak.
I have no idea what you wanted to do with those reductions.

How to solve this

Well, traditional shaker sort cannot work in parallel because within one iteration of the outer loop, an element may travel the whole distance to the end of the range. That requires an amount of inter-thread coordination that is infeasible.

What you can do is a variation of bubble sort where each thread looks at two values and swaps them. Move this window back and forth and values will slowly travel towards their correct position.

This should work:

#include <utility>
// using std::swap

void shake_sort(int* arr, int n) noexcept
{
  using std::swap;
  const int even_to_odd = n / 2;
  const int odd_to_even = (n - 1) / 2;
  bool any_swap;
  do {
    any_swap = false;
#   pragma omp parallel for reduction(|:any_swap)
    for(int i = 0; i < even_to_odd; ++i) {
      int left = i * 2;
      int right = left + 1;
      if(arr[left] > arr[right]) {
        swap(arr[left], arr[right]);
        any_swap = true;
      }
    }
#   pragma omp parallel for reduction(|:any_swap)
    for(int i = 0; i < odd_to_even; ++i) {
      int left = i * 2 + 1;
      int right = left + 1;
      if(arr[left] > arr[right]) {
        swap(arr[left], arr[right]);
        any_swap = true;
      }
    }
  } while(any_swap);
}

Note how you can't exclude the left and right border because one outer iteration cannot guarantee that the value there is correct.

Other remarks:

Others have already commented on how std::swap makes the code more readable
You don't need to specify num_threads. OpenMP can figure this out itself