Search code examples
c++openmpx86-64atomicmemory-fences

OpenMP atomic and non-atomic reads/writes produce the same instructions on x86_64


According to the OpenMP Specification (v4.0), the following program contains a possible data race due to unsynchronized read/write of i:

int i{0}; // std::atomic<int> i{0};

void write() {
// #pragma omp atomic write // seq_cst
   i = 1;
}

int read() {
   int j;
// #pragma omp atomic read // seq_cst
   j = i; 
   return j;
}

int main() {
   #pragma omp parallel
   { /* code that calls both write() and read() */ }
}

Possible solutions that came to my mind are shown in the code as comments:

  1. to protect write and read of i with #pragma omp atomic write/read,
  2. to protect write and read of i with #pragma omp atomic write/read seq_cst,
  3. to use std::atomic<int> instead of int as a type of i.

Here are the compilers-generated instructions on x86_64 (with -O2 in all cases):

GNU g++ 4.9.2:               i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          MOV           MOV
// #pragma omp atomic seq_cst:  MOV           MOV
#pragma omp atomic seq_cst:  MOV+MFENCE    MOV    (see UPDATE)
std::atomic<int>:            MOV+MFENCE    MOV

clang++ 3.5.0:               i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          MOV           MOV
#pragma omp atomic seq_cst:  MOV           MOV
std::atomic<int>:            XCHG          MOV

Intel icpc 16.0.1:           i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          *             *
#pragma omp atomic seq_cst:  *             *
std::atomic<int>:            XCHG          MOV

* Multiple instructions with calls to __kmpc_atomic_xxx functions.

What I wonder is why the GNU/clang compiler does not generate any special instructions for #pragma omp atomic writes. I would expect similar instructions as for std::atomic, i.e, either MOV+MFENCE or XCHG. Any explanation?

UPDATE

g++ 5.3.0 produces MFENCE for #pragma omp atomic write seq_cst. That is the correct behavior, I believe. Without seq_cst, it produces plain MOV, which is sufficient for non-SC atomicity.

There was a bug in my Makefile, g++ 4.9.2 produces MFENCE for CS atomic write as well. Sorry guys for that.

Clang 3.5.0 does not implement the OpenMP SC atomics, thanks Hristo Iliev for pointing this out.


Solution

  • There are two possibilities.

    1. The compiler is not obligated to convert C++ code containing a data race into bad machine code. Depending on the machine memory model, the instructions normally used may already be atomic and coherent. Take that same C++ code to another architecture and you may start seeing the pragmas cause differences that didn't exist on x86_64.

    2. In addition to potentially causing use of different instructions and/or extra memory fence instructions, the atomic pragmas (as well std::atomic and volatile) also constrain the compiler's own code reordering optimizations. They may not apply to your simply case, but you certainly could see that common-subexpression elimination, including hoisting computations outside a loop, may be affected.