According to the OpenMP Specification (v4.0), the following program contains a possible data race due to unsynchronized read/write of i
:
int i{0}; // std::atomic<int> i{0};
void write() {
// #pragma omp atomic write // seq_cst
i = 1;
}
int read() {
int j;
// #pragma omp atomic read // seq_cst
j = i;
return j;
}
int main() {
#pragma omp parallel
{ /* code that calls both write() and read() */ }
}
Possible solutions that came to my mind are shown in the code as comments:
i
with #pragma omp atomic write/read
,i
with #pragma omp atomic write/read seq_cst
,std::atomic<int>
instead of int
as a type of i
.Here are the compilers-generated instructions on x86_64 (with -O2
in all cases):
GNU g++ 4.9.2: i = 1; j = i;
original code: MOV MOV
#pragma omp atomic: MOV MOV
// #pragma omp atomic seq_cst: MOV MOV
#pragma omp atomic seq_cst: MOV+MFENCE MOV (see UPDATE)
std::atomic<int>: MOV+MFENCE MOV
clang++ 3.5.0: i = 1; j = i;
original code: MOV MOV
#pragma omp atomic: MOV MOV
#pragma omp atomic seq_cst: MOV MOV
std::atomic<int>: XCHG MOV
Intel icpc 16.0.1: i = 1; j = i;
original code: MOV MOV
#pragma omp atomic: * *
#pragma omp atomic seq_cst: * *
std::atomic<int>: XCHG MOV
* Multiple instructions with calls to __kmpc_atomic_xxx functions.
What I wonder is why the GNU/clang compiler does not generate any special instructions for #pragma omp atomic
writes. I would expect similar instructions as for std::atomic
, i.e, either MOV+MFENCE
or XCHG
. Any explanation?
UPDATE
g++ 5.3.0 produces MFENCE
for #pragma omp atomic write seq_cst
. That is the correct behavior, I believe. Without seq_cst
, it produces plain MOV
, which is sufficient for non-SC atomicity.
There was a bug in my Makefile, g++ 4.9.2 produces MFENCE
for CS atomic write as well. Sorry guys for that.
Clang 3.5.0 does not implement the OpenMP SC atomics, thanks Hristo Iliev for pointing this out.
There are two possibilities.
The compiler is not obligated to convert C++ code containing a data race into bad machine code. Depending on the machine memory model, the instructions normally used may already be atomic and coherent. Take that same C++ code to another architecture and you may start seeing the pragmas cause differences that didn't exist on x86_64.
In addition to potentially causing use of different instructions and/or extra memory fence instructions, the atomic pragmas (as well std::atomic
and volatile
) also constrain the compiler's own code reordering optimizations. They may not apply to your simply case, but you certainly could see that common-subexpression elimination, including hoisting computations outside a loop, may be affected.