Is `seq_cst` strictly stronger or equally strong than `acq_rel`?

By strictly stronger or equally strong, I mean any acq_rel can be replaced by seq_cst, and this does not weaken any guarantees provided by acq_rel semantics.

When I read the cpp reference literally:

memory_order_acq_rel A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before the load, nor after the store. All writes in other threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable.

memory_order_seq_cst A load operation with this memory order performs an acquire operation, a store performs a release operation, and read-modify-write performs both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order (see Sequentially-consistent ordering below).

I noticed seq_cst is missing a sentence that shows up in acq_rel:

No memory reads or writes in the current thread can be reordered before the load, nor after the store.

And the reference further mentions (in a different section):

in many cases, memory_order_seq_cst atomic operations are reorderable with respect to other atomic operations performed by the same thread.

These made me feel like acq_rel is a two-way barrier while seq_cst is no barrier at all. On the other hand, all texts that I have read say seq_cst provides stronger guarantee. I don't know if 1) seq_cst is also a barrier like acq_rel, and 2) if it's safe to "upgrade" any acq_rel into seq_cst.

Solution

SC is acq_rel plus extra guarantees. A few possibilities for the wording choices:

acq_rel can be summarized more simply; seq_cst has extra guarantees about not reordering with other SC operations.
Or the authors of the CppReference wiki documentation are avoiding repeating themselves with the assumption that the reader knows that seq_cst is no weaker than acq_rel. That is indeed a true fact.

Remember it's not the normative standards wording; it tries to translate from standardese into something easier to think about. (Such as memory ordering in terms of local reordering of accesses to coherent shared cache, which is a concept the ISO C++ standard doesn't include at all. It only talks about happens-before guarantees or lack thereof.)

Fun fact: real implementations typically handle runtime-variable memory_order parameters by treating them as seq_cst instead of branching to maybe run fences or not.

No memory reads or writes in the current thread can be reordered before the load, nor after the store.

This is already implied by the load being acquire and the store being release (https://preshing.com/20120913/acquire-and-release-semantics/) which both paragraphs mention.

Also, the guarantees are stronger for seq_cst so that wouldn't be as useful a summary: earlier seq_cst stores also can't reorder with the load side of a seq_cst RMW, and later seq_cst loads can't reorder with the store side of a seq_cst RMW. (I think that's true, that the individual parts of the RMW are at least as strong as a separate SC operations.)

Non-SC loads and stores on other sides of the SC RMW are still as limited by it as an acq_rel RMW.

These made me feel like acq_rel is a two-way barrier while seq_cst is no barrier at all.

An acq_rel operation is not a two-way barrier for other operations on opposite sides of it. The store side can reorder with later loads and maybe stores, as long as they're to different locations.

For purposes of ordering, is atomic read-modify-write one operation or two?

acq_rel and seq_cst fences are two-way barriers (with acq_rel not blocking StoreLoad reordering). seq_cst also does any other special stuff required to avoid IRIW reordering on ISAs where that's possible; acq_rel doesn't rule it out. (Will two atomic writes to different locations in different threads always be seen in the same order by other threads?)

re: original phrasing of the question:

seq_cst isn't strictly stronger than acq_rel; ISO C++ doesn't require acq_rel to be weaker in any particular way. A valid implementation could promote everything to seq_cst so they were exactly equal.

On real implementations, though, seq_cst normally is stronger for at least some operations. (e.g. on x86, SC stores are done with xchg or mov+mfence to prevent StoreLoad reordering with later loads. Everything else is the same between SC and acq_rel operations, although acq_rel fences are a no-op on x86 while SC fences are mfence or a dummy locked operation.)

ARMv8.0 without ldapr might promote acquire and acq_rel to seq_cst: ldar loads wait for previous release operations to drain from the store buffer. But ARMv8.3-A ldapr (partial-order wrt. stlr) is just acquire without that special interaction with release (and SC) stores from this core.