c++atomic memory-barriers memory-model stdatomic

Acquire/Release VS Sequential Consistency in C++11？

#include <thread>
#include <atomic>
#include <cassert>

std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};

void write_x()
{
    x.store(true, std::memory_order_release);
}

void write_y()
{
    y.store(true, std::memory_order_release);
}

void read_x_then_y()
{
    while (!x.load(std::memory_order_acquire))
        ;
    if (y.load(std::memory_order_acquire)) {
        ++z;
    }
}

void read_y_then_x()
{
    while (!y.load(std::memory_order_acquire))
        ;
    if (x.load(std::memory_order_acquire)) {
        ++z;
    }
}

int main()
{
    std::thread a(write_x);
    std::thread b(write_y);
    std::thread c(read_x_then_y);
    std::thread d(read_y_then_x);
    a.join(); b.join(); c.join(); d.join();
    assert(z.load() != 0);
}

If I relplace seq_cst to acquire/release in cppreference's last example, can assert(z.load() != 0) be fail ?

Seq_CST can prevent StoreLoad reorder, but the code hasn't.
Acquire can prevent LoadLoad reorder.
Release can prevent StoreStore reorder.

Solution

Yes, the assert can fire.

The principal property that is not guaranteed by acquire / release is a single total order of modifications. It only guarantees that (the non-existent) previous actions of a and b are observed by c and d if they see true from the loads.

A (slightly contrived) example of this is on a multi-cpu (physical socket) system that isn't fully cache-coherant. Die 1 has core A running thread a and core C running thread c. Die 2 has core B running thread b and core D running thread d. The interconnect between the two sockets has a long latency when compared to a memory operation that hits on-die cache.

a and b run at the same wall clock time. C is on-die with A, so can see the store to x immediately, but the interconnect delays it's observation of the store to y, so it sees the old value. Similarly D is on-die with B, so it sees the store to y, but misses the store to x.

Whereas if you have sequential consistency, some co-ordination is required to enforce a total order, such as "C and D are blocked while the interconnect syncs the caches".