#include <atomic>
#include <thread>
void test_relaxed()
{
using namespace std;
atomic<int> x{0};
atomic<int> y{0};
std::thread t1([&] {
auto r1 = y.load(memory_order_relaxed); //a
x.store(r1, memory_order_relaxed); //b
});
std::thread t2([&] {
auto r2 = x.load(memory_order_relaxed); //c
y.store(42, memory_order_relaxed); //d
});
t1.join();
t2.join();
}
According to cppreference (in a relaxed ordering example), the above code is allowed to produce r1 == r2 == 42
.
But I have tested it on x86-64 and arm64 platforms and I cannot get this result. Is there any way to get it in practice with real compilers and CPUs?
(Godbolt)
According to the ARM Memory Tool (the article, the online tool) arm64
allows this behavior (which means it might occur on some arm64
cpus).
The following is a litmus test for your example:
AArch64 SO-q-2023-03-06
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
LDR W0,[X1] | LDR W0,[X1] ;
STR W0,[X3] | MOV W2,#42 ;
| STR W2,[X3] ;
exists
(0:X0=42 /\ 1:X0=42)
You can try it yourself in the online tool.
But there could be problems with finding the arm64
hardware that displays such behavior.
I have no data for arm64
, but there is such data for ARMv7 (it might give you insight, or you might want to try to reproduce you example on ARMv7).
You test case is very similar to LB+data+po
litmus test.
The results for the litmus test on different hardware is here: as you can see, it reproduces only on some hardware.
Meaning of the hardware abbreviations used in the table is given here.