Consider the following code.
#include <atomic>
#include <cassert>
#include <thread>
struct foo {
std::atomic<foo*> _next = nullptr;
};
foo dummy = {};
foo a = {};
std::atomic<foo*> b = nullptr;
void thread_0() {
auto* old = b.load(std::memory_order_acquire); // (1)
do {
a._next.store(old, std::memory_order_relaxed); // First iteration: (2), Second iteration: (4)
} while (!b.compare_exchange_weak(old, &a, std::memory_order_release, std::memory_order_relaxed)); // First iteration: (3), Second iteration: (5)
}
void thread_1() {
// plain store is enough for the demonstration here,
// but then the code may hang indefinitely when it is actually executed.
// Just think this is a plain store in this example.
const auto* avoid_hang = b.exchange(&dummy, std::memory_order_relaxed); // (6)
if (avoid_hang) { return; }
while (b.load(std::memory_order_acquire) != &a) {}; // (7)
assert(a._next.load(std::memory_order_relaxed)); // Can this assert fire?
}
/*
void thread_1_ignore_hang() {
b.store(&dummy, std::memory_order_relaxed); // (6)
while (b.load(std::memory_order_acquire) != &a) {}; // (7)
assert(a._next.load(std::memory_order_relaxed)); // Can this assert fire?
}
*/
int main() {
std::jthread t0(thread_0);
std::jthread t1(thread_1);
return 0;
}
And the following order of execution.
Thread 0 Thread 1
(1) old = nullptr(load b, mo_acquire)
(2) a->_next = nullptr (6) b = &dummy
(3) cmpxchg fails, old = &dummy (load b, mo_relaxed)
(4) a->_next = &dummy
(5) cmpxchg succeeds, b = &a (load b, mo_relaxed, store to b, mo_release)
(7) (load b, mo_acquire)
AFAIK, std::memory_order_release
store (introduced by compare_exchange_weak
) only prevent reads/writes before it from being reordered after it.
Then, in Thread 1's view, could (4) be reordered before (2) and read a._next
as nullptr
and make the assert fire?
I can't really test this with my hardware because on x86 all atomic loads/stores will have std::memory_order_acquire
/std::memory_order_release
memory ordering, respectively.
I think two separate questions are being conflated here.
First, to answer the title question, loops are irrelevant to memory ordering. All that matters is program order - "sequencing" in the language of the C++ standard. You will not find any mention of looping in the memory model sections of the standard. And at the level of the machine, for purposes of memory ordering, CPUs don't care what non-memory instructions are executed between two memory accesses; in particular they don't care if one of them is a branch.
That means that, in your example, (4) can be reordered before (3).
But it does not mean that (4) can be reordered before (2); this is not allowed. It is nothing to do with acquire/release ordering; it is simply the fact that every atomic variable has a modification order [intro.races p4], which must be consistent with the happens-before ordering. This is the write-write coherence rule [intro.races p15].
In particular, it must be consistent with sequencing. Since (2) is unquestionably sequenced before (4), the value nullptr
must precede &dummy
in the modification order of a._next
.
Now the load in the assert (call it (8)) is sequenced after the acquire load (7), which takes its value &a
from the release store (5), which is sequenced after the store (4). So store (4) happens before load (8). By write-read coherence [intro.races p18], load (8) must not return any value that precedes, in the modification order, the value stored by (4), namely &dummy
. So load (8) cannot return nullptr
, and the assert cannot fire.
In short, weak memory ordering only affects how loads and stores of different objects can be reordered. But two stores to the same object can never be reordered with each other.
For some intuition about this rule, think back to the reason we have memory reordering in the first place: a memory access can take a long time, and we'd like to go on and do more useful work while we're waiting for it to finish. So if you have
a.store(5, std::memory_order_relaxed);
b.store(10, std::memory_order_relaxed);
and the store to a
is stalled, we might as well just put it in a store buffer and move on. If we're able to access b
more quickly, then a weakly-ordered architecture will go ahead and do the store to b
while a
is still waiting.
But if instead we have
a.store(5, std::memory_order_relaxed);
// more code
a.store(10, std::memory_order_relaxed);
and if the first store is still waiting by the time we get to the second store, the correct thing to do is to replace it in the store buffer by the second store. There's no guarantee that any other core would have happened to load a
while it had the value 5
anyway, so we might as well not do the first store at all. But the store of 10 can't possibly be ready to commit before the store of 5, because they are both waiting for the same cache line to become available. And so even if the memory ordering rules allowed the two stores to be reordered, it would provide no benefit in efficiency, while being very confusing and making it hard to write correct code without sprinkling barriers everywhere.