std::atomic<bool> x{ false };
std::atomic<bool> y{ false };
// thread 1
y.store(true, std::memory_order_seq_cst);
x.store(true, std::memory_order_release);
// thread2
while (!x.load(std::memory_order_relaxed);
assert(y.load(std::memory_order_seq_cst)); // !!!
Can the assert fail? My understanding is: while reading x
is "relaxed", once "thread 2" sees the write by "thread 1" it can't see y
as false
because the write to y
happens before the write to x
.
The memory order is replicated from a real-life case and could be weaker for this sample, but I haven't changed it to not miss any subtleties.
Yes, ISO C++ allows the y.load
to take a value from before x.load
saw a true
, because x.load
isn't acquire
or stronger.
(More formally, it doesn't create a happens-before before the writer and reader that put y.store
before y.load
)
On many ISAs (such as x86 or PowerPC) it wouldn't be possible to observe it in practice, x86 because LoadLoad reordering isn't allowed in general, and PowerPC because a seq_cst load involves some fencing before the instruction (perhaps to block IRIW reordering), which ends up being sufficient to block LoadLoad reordering even with earlier relaxed loads. https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html shows PowerPC's load(seq_cst)
being hwsync; ld; cmp; bc; isync
. hwsync
is a full barrier, the same instruction they use for atomic_thread_fence(seq_cst)
.
But you might be able to observe this on AArch64 where LDR (load(relaxed)
can reorder with a later LDAR (load(seq_cst)
). To actually see it, you probably need the variables in separate cache lines, and for the loop exit branch to be correctly predicted, otherwise the later SC load won't get into the pipeline until after the spin-loop load has produced a value (and branch mispredict recovery has re-steered the front-end to fetching from the correct path).