Search code examples
c++multithreadingatomicmemory-barriersstdatomic

is this memory order correct?


std::atomic<bool> x{ false };
std::atomic<bool> y{ false };

// thread 1
y.store(true, std::memory_order_seq_cst);
x.store(true, std::memory_order_release);

// thread2
while (!x.load(std::memory_order_relaxed);
assert(y.load(std::memory_order_seq_cst)); // !!!

Can the assert fail? My understanding is: while reading x is "relaxed", once "thread 2" sees the write by "thread 1" it can't see y as false because the write to y happens before the write to x.

The memory order is replicated from a real-life case and could be weaker for this sample, but I haven't changed it to not miss any subtleties.


Solution

  • Yes, ISO C++ allows the y.load to take a value from before x.load saw a true, because x.load isn't acquire or stronger.

    (More formally, it doesn't create a happens-before before the writer and reader that put y.store before y.load)

    On many ISAs (such as x86 or PowerPC) it wouldn't be possible to observe it in practice, x86 because LoadLoad reordering isn't allowed in general, and PowerPC because a seq_cst load involves some fencing before the instruction (perhaps to block IRIW reordering), which ends up being sufficient to block LoadLoad reordering even with earlier relaxed loads. https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html shows PowerPC's load(seq_cst) being hwsync; ld; cmp; bc; isync. hwsync is a full barrier, the same instruction they use for atomic_thread_fence(seq_cst).

    But you might be able to observe this on AArch64 where LDR (load(relaxed) can reorder with a later LDAR (load(seq_cst)). To actually see it, you probably need the variables in separate cache lines, and for the loop exit branch to be correctly predicted, otherwise the later SC load won't get into the pipeline until after the spin-loop load has produced a value (and branch mispredict recovery has re-steered the front-end to fetching from the correct path).