c++assembly arm volatile memory-barriers

How compiler enforces C++ volatile in ARM assembly

According to cppreference, store of one volatile qualified cannot be reordered wrt to another volatile qualified variable. In other words, in the below example, when y becomes 20, it is guaranteed that x will be 10.

volatile int x, y;
...
x = 10;
y = 20;

According to Wikipedia, ARM processor a store can be reordered after another store. So, in the below example, second store can be executed before first store since both destinations are disjoint, and hence they can be freely reordered.

str     r1, [r3]
str     r2, [r3, #4]

With this understanding, I wrote a toy program:

volatile int x, y;

int main() {
    x = 10;
    y = 20;
}

I expected some fencing to be present in the generated assembly to guarantee the store order of x and y. But the generated assembly for ARM was:

main:
        movw    r3, #:lower16:.LANCHOR0
        movt    r3, #:upper16:.LANCHOR0
        movs    r1, #10
        movs    r2, #20
        movs    r0, #0
        str     r1, [r3]
        str     r2, [r3, #4]
        bx      lr
x:
y:

So, how storing order is enforced here?

Solution

so, in the below example, second store can be executed before first store since both destinations are disjoint, and hence they can be freely reordered.

The volatile keyword limits the reordering (and elision) of instructions by the compiler, but its semantics don't say anything about visibility from other threads or processors.

When you see

        str     r1, [r3]
        str     r2, [r3, #4]

then volatile has done everything required. If the addresses of x and y are I/O mapped to a hardware device, it will have received the x store first. If an interrupt pauses operation of this thread between the two instructions, the interrupt handler will see the x store and not the y. That's all that is guaranteed.

The memory ordering model only describes the order in which effects are observable from other processors. It doesn't alter the sequence in which instructions are issued (which is the order they appear in the assembly code), but the order in which they are committed (ie, a store becomes externally visible).

It is certainly possible that a different processor could see the result of the y store before the x - but volatile is not and never has been relevant to that problem. The cross-platform solution to this is std::atomic.

There is unfortunately a load of obsolete C code available on the internet that does use volatile for synchronization - but this is always a platform-specific extension, and was never a great idea anyway. Even less fortunately the keyword was given exactly those semantics in Java (which isn't really used for writing interrupt handlers), increasing the confusion.

If you do see something using volatile like this, it's either obsolete or was incompetently translated from Java. Use std::atomic, and for anything more complex than simple atomic load/store, it's probably better (and is certainly easier) to use std::mutex.