c++multithreading c++11 atomic memory-model

Reordering and memory_order_relaxed

Cppreference gives the following example about memory_order_relaxed:

Atomic operations tagged memory_order_relaxed are not synchronization operations, they do not order memory. They only guarantee atomicity and modification order consistency.

Then explains that, with x and y initially zero, this example code

// Thread 1:
r1 = y.load(memory_order_relaxed); // A
x.store(r1, memory_order_relaxed); // B

// Thread 2:
r2 = x.load(memory_order_relaxed); // C 
y.store(42, memory_order_relaxed); // D

is allowed to produce r1 == r2 == 42 because:

Although A is sequenced-before B within thread 1 and C is sequenced-before D in thread 2,
Nothing prevents D from appearing before A in the modification order of y, and B from appearing before C in the modification order of x.

Now my question is: if A and B can't be reordered within thread 1 and, similarly, C and D within thread 2 (since each of those is sequenced-before within its thread), aren't points 1 and 2 in contradiction? In other words, with no reordering (as point 1 seems to require), how is the scenario in point 2, visualized below, even possible?

T1 ........... T2

.............. D(y)

A(y)

B(x)

.............. C(x)

Because in this case C would not be sequenced-before D within thread 2, as point 1 demands.

Solution

with no reordering (as point 1 seems to require)

Point 1 does not mean "no reordering". It means sequencing of events within a thread of execution. The compiler will issue the CPU instruction for A before B and the CPU instruction for C before D (although even that may be subverted by the as-if rule), but the CPU has no obligation to execute them in that order, caches/write buffers/invalidation queues have no obligation to propagate them in that order, and memory has no obligation to be uniform.

(individual architectures may offer those guarantees though)