By strictly stronger or equally strong, I mean any acq_rel
can be replaced by seq_cst
, and this does not weaken any guarantees provided by acq_rel
semantics.
When I read the cpp reference literally:
memory_order_acq_rel
A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before the load, nor after the store. All writes in other threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable.
memory_order_seq_cst
A load operation with this memory order performs an acquire operation, a store performs a release operation, and read-modify-write performs both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order (see Sequentially-consistent ordering below).
I noticed seq_cst
is missing a sentence that shows up in acq_rel
:
No memory reads or writes in the current thread can be reordered before the load, nor after the store.
And the reference further mentions (in a different section):
in many cases,
memory_order_seq_cst
atomic operations are reorderable with respect to other atomic operations performed by the same thread.
These made me feel like acq_rel
is a two-way barrier while seq_cst
is no barrier at all. On the other hand, all texts that I have read say seq_cst
provides stronger guarantee. I don't know if 1) seq_cst
is also a barrier like acq_rel
, and 2) if it's safe to "upgrade" any acq_rel
into seq_cst
.
SC is acq_rel plus extra guarantees. A few possibilities for the wording choices:
acq_rel
can be summarized more simply; seq_cst
has extra guarantees about not reordering with other SC operations.seq_cst
is no weaker than acq_rel
. That is indeed a true fact.Remember it's not the normative standards wording; it tries to translate from standardese into something easier to think about. (Such as memory ordering in terms of local reordering of accesses to coherent shared cache, which is a concept the ISO C++ standard doesn't include at all. It only talks about happens-before guarantees or lack thereof.)
Fun fact: real implementations typically handle runtime-variable memory_order
parameters by treating them as seq_cst
instead of branching to maybe run fences or not.
No memory reads or writes in the current thread can be reordered before the load, nor after the store.
This is already implied by the load being acquire and the store being release (https://preshing.com/20120913/acquire-and-release-semantics/) which both paragraphs mention.
Also, the guarantees are stronger for seq_cst
so that wouldn't be as useful a summary: earlier seq_cst
stores also can't reorder with the load side of a seq_cst
RMW, and later seq_cst
loads can't reorder with the store side of a seq_cst
RMW. (I think that's true, that the individual parts of the RMW are at least as strong as a separate SC operations.)
Non-SC loads and stores on other sides of the SC RMW are still as limited by it as an acq_rel
RMW.
These made me feel like acq_rel is a two-way barrier while seq_cst is no barrier at all.
An acq_rel
operation is not a two-way barrier for other operations on opposite sides of it. The store side can reorder with later loads and maybe stores, as long as they're to different locations.
For purposes of ordering, is atomic read-modify-write one operation or two?
acq_rel
and seq_cst
fences are two-way barriers (with acq_rel
not blocking StoreLoad reordering). seq_cst
also does any other special stuff required to avoid IRIW reordering on ISAs where that's possible; acq_rel
doesn't rule it out. (Will two atomic writes to different locations in different threads always be seen in the same order by other threads?)
re: original phrasing of the question:
seq_cst
isn't strictly stronger than acq_rel
; ISO C++ doesn't require acq_rel
to be weaker in any particular way. A valid implementation could promote everything to seq_cst
so they were exactly equal.
On real implementations, though, seq_cst
normally is stronger for at least some operations. (e.g. on x86, SC stores are done with xchg
or mov+mfence
to prevent StoreLoad reordering with later loads. Everything else is the same between SC and acq_rel operations, although acq_rel fences are a no-op on x86 while SC fences are mfence
or a dummy lock
ed operation.)
ARMv8.0 without ldapr
might promote acquire and acq_rel
to seq_cst
: ldar
loads wait for previous release operations to drain from the store buffer. But ARMv8.3-A ldapr
(partial-order wrt. stlr
) is just acquire
without that special interaction with release (and SC) stores from this core.