Parallel Recursive Bitonic Sort slower than Serial (OpenMP)

My OpenMP (C) parallel code is slower than the serial bitonic sort code and although I have done all the possible combinations I could think of, I can't find the problem.

Here is part of the code:

void recBitonicSortOpenMP( int lo, int cnt, int dir ) {
   if (  cnt >  1 ) {
         int k = cnt / 2;
         #pragma omp parallel sections 

         #pragma omp section 
         recBitonicSortOpenMP( lo,     k, ASCENDING );

         #pragma omp section    
         recBitonicSortOpenMP( lo + k, k, DESCENDING );

         bitonicMergeOpenMP( lo, cnt, dir );



It is an implementation of Bitonic Sort.


  • For recursive problems like this use OpenMP tasks, not nested parallelism and sections. gives you a good example to follow...