Is this seqlock implementation correct from the c++ memory model point of view?

I red this article Can Seqlocks Get Along With Programming Language Memory Models?

And found that one of the example (figure 6) is not correct from the c++ memory model point of view .

atomic<unsigned> seq; // seqlock representation
// assume big enough to ignore overflow
atomic<int> data1, data2; // protected by seq

T reader() {
    int r1, r2;
    unsigned seq0, seq1;
    do {
        seq0 = seq.load(m_o_acquire);
        r1 = data1.load(m_o_relaxed);
        r2 = data2.load(m_o_relaxed);
        atomic_thread_fence(m_o_acquire);
        seq1 = seq.load(m_o_relaxed);
    } while (seq0 != seq1 || seq0 & 1);
    // do something with r1 and r2;
}

void writer(...) {
    unsigned seq0 = seq;
    while (seq0 & 1 ||
           !seq.compare_exchange_weak(seq0, seq0+1)) {}
    data1 = ...;
    data2 = ...;
    seq = seq0 + 2;
}

Figure 6. Seqlock reader with acquire fence

Here it is assumed that at the time of reading seq0 there was synchronize with the reader and the "last" writer, but the memory model does not guarantee this. To do this, we must force synchronize with an RMW operation (e.g. seq.fetch_add(0) or seq.cas(seq0, seq0)) when reading seq1 and if seq1 == seq0, therefore at the time of reading seq0 there was synchronize with with "last" writer => happens before

Solution

Cppref memory order

Synchronizes with

If an atomic store in thread A is a release operation, an atomic load in thread B from the same variable is an acquire operation, and the load in thread B reads a value written by the store in thread A, then the store in thread A synchronizes-with the load in thread B.

Reading seq0 with an acquire order is enough to establish a synchronizes with relationship with the writer, you don't need an RMW here, you either read an old value which will make the whole operation fail and retried or you read the new value which will establish the synchronizes with relationship, there is no third option.

If you read the old value for seq0 and read new data then the fence will force seq1 to be the new value, which will fail the check, there is no scenario where seq0 == seq1 and the data is not the same one written before seq0

The main problem with seqlock is that writes can be reordered by the CPU which is why a C++ memory-model compliant implementation needs a compare-exchange on the writer side, which makes it not exactly a seqlock, but a slower modified version of it.

While the provided seqlock has no races, it can read any value for the data in the modification order, not necessarily the newest, doing an RMW to have this guarantee would make it an order of magnitude slower because of cache ping-ponging.