I'm trying to understand the purpose of std::atomic_thread_fence(std::memory_order_seq_cst);
fences, and how they're different from acq_rel
fences.
So far my understanding is that the only difference is that seq-cst fences affect the global order of seq-cst operations ([atomics.order]/4
). And said order can only be observed if you actually perform seq-cst loads.
So I'm thinking that if I have no seq-cst loads, then I can replace all my seq-cst fences with acq-rel fences without changing the behavior. Is that correct?
And if that's correct, why am I seeing code like this "implementation Dekker's algorithm with Fences", that uses seq-cst fences, while keeping all atomic reads/writes relaxed? Here's the code from that blog post:
std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);
void p0()
{
flag0.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
while (flag1.load(std::memory_order_relaxed))
{
if (turn.load(std::memory_order_relaxed) != 0)
{
flag0.store(false,std::memory_order_relaxed);
while (turn.load(std::memory_order_relaxed) != 0)
{
}
flag0.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
}
std::atomic_thread_fence(std::memory_order_acquire);
// critical section
turn.store(1,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
flag0.store(false,std::memory_order_relaxed);
}
void p1()
{
flag1.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
while (flag0.load(std::memory_order_relaxed))
{
if (turn.load(std::memory_order_relaxed) != 1)
{
flag1.store(false,std::memory_order_relaxed);
while (turn.load(std::memory_order_relaxed) != 1)
{
}
flag1.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
}
std::atomic_thread_fence(std::memory_order_acquire);
// critical section
turn.store(0,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
flag1.store(false,std::memory_order_relaxed);
}
As I understand it, they're not the same, and a counterexample is below. I believe the error in your logic is here:
And said order can only be observed if you actually perform seq-cst loads.
I don't think that's true. In atomics.order p4 which defines the
axioms of the sequential consistency total order S, items 2-4 all may
involve operations which are not seq_cst
. You can observe the
coherence ordering between such operations, and this can let you infer
how the seq_cst
operations have been ordered.
As an example, consider the following version of the StoreLoad litmus test, akin to Peterson's algorithm:
std::atomic<bool> a,b; // initialized to false
void thr1() {
a.store(true, std::memory_order_seq_cst);
std::atomic_thread_fence(std::memory_order_seq_cst);
if (b.load(std::memory_order_relaxed) == false)
std::cout << "thr1 wins";
}
void thr2() {
b.store(true, std::memory_order_seq_cst);
std::atomic_thread_fence(std::memory_order_seq_cst);
if (a.load(std::memory_order_relaxed) == false)
std::cout << "thr2 wins";
}
Note all the loads are relaxed
.
I claim that if thr1 prints "thr1 wins", then we deduce that
a.store(true)
preceded b.store(true)
in the sequential consistency
order S.
To see this, let A be b.load()
and B be b.store(true)
. If
b.load() == false
then we have that A is coherence-ordered before
B. (Apply atomics.order p3.3 with A,B as above, and X the
initialization of b
to false
.)
Now let X be the fence in thr1. Then X happens before A by
sequencing, so X precedes B in S by atomics.order p4.3. That is, the
thr1 fence precedes b.store(true)
. And a.store(true)
, which is
also seq_cst
, precedes the thr1 fence, because the store strongly
happens before the fence by sequencing. So by
transitivity, a.store(true)
precedes b.store(true)
, as claimed.
Similarly, if thr2 prints, then b.store(true)
precedes
a.store(true)
. They can't both precede each other, so we have
therefore proved that the program cannot print both messages.
If you downgrade the fences to acq_rel
, the proof breaks down. In
that case, as far as I can see, nothing prevents the program from
printing thr1 wins
even if b.store(true)
precedes a.store(true)
in the order S. As such, with acq_rel
fences, I believe it is
allowed for both threads to print. Though I'm not sure whether there
is any real implementation where it could actually happen.
We can get an even simpler example if we make all the loads and stores relaxed
, so that the only seq_cst
operations are the fences. Then we can use (4.4) instead to show that if b.load(relaxed)
returns false
, the thr1 fence precedes the thr2 fence, and vice versa if a.load()
returns false
. We therefore conclude, as before, that the program cannot print both messages.
However, if we keep the loads and stores relaxed and weaken the fences to acq_rel
, it is then more clear that we have lost this guarantee. Indeed, with a little prodding, a similar example actually fails on x86, where the acq_rel
fence is a no-op because the loads/stores are already acquire/release. So that's a clearer case where a seq_cst
fence really achieves something that acq_rel
does not.