Acquire-release on x86

In the Intel Manual Vol.3 there is an example of loads reordering with early stores.

Initially x = y = 0

Core 1:

mov [x], 1
mov r2, [y]

Core 2:

mov [y], 1
mov r1, [x]

So r1 = r2 = 0 is possible. The question is if requiring acquire-release prohibits such scenario? On x86 store is a release store so I think no. Example:

Core 1:

release(mov [x], 1)
mov r2, [y]

Core 2:

mov [y], 1
acquire(mov r1, [x])

In this case if acquire(mov r1, [x]) loads observe 0 then it's only possible to conclude that release(mov [x], 1) is not synchronized-with acquire(mov r1, [x]) in terms of the C11 Standard memory model specification standpoint, and it does not provide any guarantees which could prohibit reordering mov [y], 1 and acquire(mov r1, [x]) on the Core 2

Solution

Correct, acquire/release semantics cannot prevent StoreLoad reordering, i.e. taking a store followed by a load and interchanging their order. And such reordering is allowed for ordinary load and store instructions on x86.

If you want to avoid such reordering in C11, you need to use memory_order_seq_cst on both the store and the load. In x86 assembly, you need a barrier in between the two instructions. mfence serves this purpose, but so does any locked read-modify-write instruction, including xchg which does so even without the lock prefix. So if you look at the generated assembly for memory_order_seq_cst operations, you'll see some such barrier in between. (For certain reasons, something like lock add [rsp], 0, or xchg between some register and memory whose contents are unimportant, can actually be more performant than mfence, so some compilers will do that even though it looks weird.)