Search code examples
cgccopenmpundefined-behavioricc

Can I put multiple ordered statements in one ordered for loop (OpenMP)?


I just found out that while this C code gives an ordered list of integers (as expected):

#include <stdio.h>
#include <unistd.h>
#include <omp.h>

int main() {
#pragma omp parallel for ordered schedule(dynamic)
  for (int i=0; i<10; i++) {
#pragma omp ordered
    {
    printf("%i             (tid=%i)\n",i,omp_get_thread_num(); fflush(stdout);
    }
  }
}

With both gcc as well as icc, the following gives undesired behaviour:

#include <stdio.h>
#include <unistd.h>
#include <omp.h>

int main() {
#pragma omp parallel for ordered schedule(dynamic)
  for (int i=0; i<10; i++) {
#pragma omp ordered
    {
    printf("%i             (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);
    }

    usleep(100*omp_get_thread_num());
    printf("WORK IS DONE  (tid=%i)\n",omp_get_thread_num()); fflush(stdout);
    usleep(100*omp_get_thread_num());

#pragma omp ordered
    {
    printf("  %i           (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);
    }
  }
} 

What I'd love to see is:
0
1
2
3
4
5
6
7
8
9
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
0
1
2
3
4
5
6
7
8
9

But with gcc is get:
0 (tid=5)
WORK IS DONE (tid=5)
0 (tid=5)
1 (tid=2)
WORK IS DONE (tid=2)
1 (tid=2)
2 (tid=0)
WORK IS DONE (tid=0)
2 (tid=0)
3 (tid=6)
WORK IS DONE (tid=6)
3 (tid=6)
4 (tid=7)
WORK IS DONE (tid=7)
4 (tid=7)
5 (tid=3)
WORK IS DONE (tid=3)
5 (tid=3)
6 (tid=4)
WORK IS DONE (tid=4)
6 (tid=4)
7 (tid=1)
WORK IS DONE (tid=1)
7 (tid=1)
8 (tid=5)
WORK IS DONE (tid=5)
8 (tid=5)
9 (tid=2)
WORK IS DONE (tid=2)
9 (tid=2)
(so everything get's ordered - even the parallelizable work part)

And with icc:
1 (tid=0)
2 (tid=5)
3 (tid=1)
4 (tid=2)
WORK IS DONE (tid=1)
WORK IS DONE (tid=3)
3 (tid=1)
6 (tid=4)
7 (tid=7)
8 (tid=1)
WORK IS DONE (tid=0)
5 (tid=6)
WORK IS DONE (tid=2)
1 (tid=0)
9 (tid=0)
WORK IS DONE (tid=0)
WORK IS DONE (tid=5)
WORK IS DONE (tid=1)
9 (tid=0)
0 (tid=3)
8 (tid=1)
WORK IS DONE (tid=4)
WORK IS DONE (tid=6)
2 (tid=5)
WORK IS DONE (tid=7)
6 (tid=4)
5 (tid=6)
4 (tid=2)
7 (tid=7)
(so nothing get's ordered not even the ordered clauses)

Is using multiple ordered clauses within one ordered loop undefined behaviour or what is going on here? I couldn't find anything disallowing multiple clauses per loop in any of the OpenMP documentations I could find.

I know that in this trivial example I could just part the loops like

int main() {  
  for (int i=0; i<10; i++) {  
    printf("%i             (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);  
  }  
#pragma omp parallel for schedule(dynamic)  
  for (int i=0; i<10; i++) {  
    usleep(100*omp_get_thread_num());  
    printf("WORK IS DONE  (tid=%i)\n",omp_get_thread_num()); fflush(stdout);  
    usleep(100*omp_get_thread_num());  
  }  
  for (int i=0; i<10; i++) {  
    printf("  %i           (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);  
  }          
}  

So I'm not looking for a workaround. I really want to understand what is going on here, so that I can handle the real situation without running into anything devastating/unexpected.

I really hope you can help me.


Solution

  • According to OpenMP 4.0 API specifications you can't.

    Only one ordered clause can appear on a loop directive (p. 58)