Search code examples
c++atomicmemory-barriersmemory-modelstdatomic

Acquire/Release VS Sequential Consistency in C++11?


#include <thread>
#include <atomic>
#include <cassert>

std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};

void write_x()
{
    x.store(true, std::memory_order_release);
}

void write_y()
{
    y.store(true, std::memory_order_release);
}

void read_x_then_y()
{
    while (!x.load(std::memory_order_acquire))
        ;
    if (y.load(std::memory_order_acquire)) {
        ++z;
    }
}

void read_y_then_x()
{
    while (!y.load(std::memory_order_acquire))
        ;
    if (x.load(std::memory_order_acquire)) {
        ++z;
    }
}

int main()
{
    std::thread a(write_x);
    std::thread b(write_y);
    std::thread c(read_x_then_y);
    std::thread d(read_y_then_x);
    a.join(); b.join(); c.join(); d.join();
    assert(z.load() != 0);
}

If I relplace seq_cst to acquire/release in cppreference's last example, can assert(z.load() != 0) be fail ?

  • Seq_CST can prevent StoreLoad reorder, but the code hasn't.
  • Acquire can prevent LoadLoad reorder.
  • Release can prevent StoreStore reorder.

Solution

  • Yes, the assert can fire.

    The principal property that is not guaranteed by acquire / release is a single total order of modifications. It only guarantees that (the non-existent) previous actions of a and b are observed by c and d if they see true from the loads.

    A (slightly contrived) example of this is on a multi-cpu (physical socket) system that isn't fully cache-coherant. Die 1 has core A running thread a and core C running thread c. Die 2 has core B running thread b and core D running thread d. The interconnect between the two sockets has a long latency when compared to a memory operation that hits on-die cache.

    a and b run at the same wall clock time. C is on-die with A, so can see the store to x immediately, but the interconnect delays it's observation of the store to y, so it sees the old value. Similarly D is on-die with B, so it sees the store to y, but misses the store to x.

    Whereas if you have sequential consistency, some co-ordination is required to enforce a total order, such as "C and D are blocked while the interconnect syncs the caches".