I have a simple C++ code snippet as shown below:
int A;
int B;
void foo() {
A = B + 1;
// asm volatile("" ::: "memory");
B = 0;
}
When I compile this code, the generated assembly code is reordered as follows:
foo():
mov eax, DWORD PTR B[rip]
mov DWORD PTR B[rip], 0
add eax, 1
mov DWORD PTR A[rip], eax
ret
B:
.zero 4
A:
.zero 4
However, when I add a memory fence (commented line in the C++ code), the instructions are not reordered. My understanding is that adding a volatile
qualifier to a variable should also prevent instruction reordering. So, I modified the code to add volatile
to variable B:
int A;
volatile int B;
void foo() {
A = B + 1;
B = 0;
}
To my surprise, the generated assembly code still shows reordered instructions. Can someone explain why the volatile
qualifier did not prevent instruction reordering in this case?
Code is available in godbolt
My understanding is that adding a volatile qualifier to a variable should also prevent instruction reordering.
That's a major oversimplification. Although the C++ standard doesn't define the semantics of volatile
very explicitly (saying only that "accesses are evaluated strictly according to the rules of the abstract machine"), the unwritten rule is that volatile
objects are treated as if some external entity (e.g. I/O hardware) may be reading and writing them asynchronously, and that both reads and writes are side effects that the external entity can observe. As such, each read/write to a volatile
object (of machine word size or less) should result in the execution of exactly one load/store instruction.
From this it follows that loads and stores to volatile
objects will not be reordered with each other. But in your program A
is not volatile, so we assume that the external entity does not see it. Therefore it does not matter how the accesses to A
are ordered with respect to accesses to B
or anything else, and the compiler is free to reorder them. Instructions like add eax, 1
that do not access memory at all are also fair game; the external entity can't see the machine registers either.
Per your use of the concurrency tag, this is one of the many reasons that volatile
is not the right approach for variables to be shared between threads - because unlike the "external entity", another thread does have access to your non-volatile
variables. In olden times prior to C++11, people used volatile
because it was all there was, and you could make it work, with the use of explicit memory barrier functions, if you knew something about the way your compiler did optimizations (which was usually undocumented). Since C++11 we have std::atomic
and that is the only right way to handle inter-thread sharing, but unfortunately the association with volatile
lingers on in obsolete docs and the minds of old-timers. See Why is volatile not considered useful in multithreaded C or C++ programming? for more.
Also relevant: Does the C++ volatile keyword introduce a memory fence? (No, it does not, as you have discovered.)