c++language-lawyer memory-model relaxed-atomics

How does C++23 happens before apply to std::memory_order_seq_cst?

Consider the following C++23 program:

#include <atomic>
#include <cassert>
#include <thread>

#define LOAD_ORDER std::memory_order_seq_cst
#define STORE_ORDER std::memory_order_seq_cst

std::atomic_int v[2]{0, 0};
int r[2];

void
f(int i)
{
  v[i].store(1, STORE_ORDER);
  r[i] = v[1-i].load(LOAD_ORDER);
}

int
main()
{
  {
    std::jthread t0(f, 0);
    std::jthread t1(f, 1);
  }
  assert(r[0] || r[1]);
  return 0;
}

One thread stores 1 to v[0] then loads v[1]. Concurrently, the other thread stores 1 to v[1] then loads v[0]. I want to ensure that at least one load returns 1. My questions:

At a high level, sequentially consistent operations cannot be reordered, so this should work. But how do you prove the above code is correct using only the sequenced before, synchronizes with, inter-thread happens before, and visible side effect concepts in the language spec? For instance, it doesn't seem to be the case that one store happens before the other.
Would the code still be correct if I changed LOAD_ORDER to std::memory_order_relaxed and left STORE_ORDER as std::memory_order_seq_cst? What about the converse of relaxing STORE_ORDER but not load?
What's the most relaxed or efficient value that I can set LOAD_ORDER and STORE_ORDER to and still guarantee the assertion will not fail?

Assume all the variables are on different cache lines--I just didn't add padding to make the example simpler. Also assume no consume operations, so we can ignore the distinctions between weakly/strongly/plain happens before.

Solution

Let's go through a couple of rules and scenarios.

t0-store is sequenced before t0-load, therefore t0-store strongly happens before t0-load; same with t1.

Without sequential consistency

In the absence of sequential consistency, we need to provide some inter-thread happens before relation between t0 and t1. Do any of the rules apply?

Does t0-store synchronize with t1-store or load? No. There is no relation with t1-store because they are independent atomic variables. It may synchronize with t1-load as a release-acquire pair but that is not guaranteed
Are there any dependency-orderings? No. The operations are all independent
Does t1-load synchronize with an evaluation X and X is sequenced before t0-store? There is no relevant operation before t0-store
Is t0-store sequenced before X and X inter-thread happens before t1-load? The only operation sequenced after t0-store is t0-load. That has no inter-thread relation with t1-load
Does t0-store inter-thread happen before X and X inter-thread happens before t1-load? That's just the transitive property. We only have two threads so it does not apply

Therefore the operations are potentially concurrent.. This is to be expected. Informally, an acquire load can be reordered before a release store and vice versa.

What we do know is that t0-store may be coherence-ordered before t1-load, and/or t1-store before t0-load.

With sequential consistency

Now let's introduce sequentially consistent operations into the mix. This gives us a single total order S with a set of rules. In particular this:

First, if A and B are memory_order::seq_cst operations and A strongly happens before B, then A precedes B in S.

We have established that t0-store strongly happens before t0-load. So t0-store precedes t0-load in S. Same with t1. Therefore, in the single total order, the loads always come after the store by their own thread. You cannot have a load in the single total order before at least one store. So at least one store will always be visible to a load.

The only other rule regarding sequentially consistent operations (not fences) is this:

Second, for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B, the following four conditions are required to be satisfied by S:

if A and B are both memory_order::seq_cst operations, then A precedes B in S

All that means for us is that if a load sees the value of a store, it is ordered after it in the single total order. Not much surprise there.

At this point it is important to note that the rules only apply to sequentially consistent operations. There are no rules that apply between sequentially consistent and other atomic operations. With regard to any other operations, sequentially consistent operations simply act as release stores or acquire loads. The standard is very explicit in warning against this style of mixing.

Note 5: memory_order::seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order::seq_cst atomic operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In many cases, memory_order::seq_cst atomic operations are reorderable with respect to other atomic operations performed by the same thread.