#include <thread>
#include <atomic>
#include <cassert>
std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};
void write_x()
{
x.store(true, std::memory_order_release);
}
void write_y()
{
y.store(true, std::memory_order_release);
}
void read_x_then_y()
{
while (!x.load(std::memory_order_acquire))
;
if (y.load(std::memory_order_acquire)) {
++z;
}
}
void read_y_then_x()
{
while (!y.load(std::memory_order_acquire))
;
if (x.load(std::memory_order_acquire)) {
++z;
}
}
int main()
{
std::thread a(write_x);
std::thread b(write_y);
std::thread c(read_x_then_y);
std::thread d(read_y_then_x);
a.join(); b.join(); c.join(); d.join();
assert(z.load() != 0);
}
If I relplace seq_cst to acquire/release in cppreference's last example,
can assert(z.load() != 0)
be fail ?
Yes, the assert can fire.
The principal property that is not guaranteed by acquire / release is a single total order of modifications. It only guarantees that (the non-existent) previous actions of a
and b
are observed by c
and d
if they see true
from the loads.
A (slightly contrived) example of this is on a multi-cpu (physical socket) system that isn't fully cache-coherant. Die 1 has core A running thread a
and core C running thread c
. Die 2 has core B running thread b
and core D running thread d
. The interconnect between the two sockets has a long latency when compared to a memory operation that hits on-die cache.
a
and b
run at the same wall clock time. C is on-die with A, so can see the store to x
immediately, but the interconnect delays it's observation of the store to y
, so it sees the old value. Similarly D is on-die with B, so it sees the store to y
, but misses the store to x
.
Whereas if you have sequential consistency, some co-ordination is required to enforce a total order, such as "C and D are blocked while the interconnect syncs the caches".