Suppose INTENABLE
is a microcontroller's register that enables/disables interrupts, and I have it declared somewhere in my libraries as a volatile variable located at the appropriate address. my_var
is some variable that is modified within one or more interrupts, as well as within my_func
.
Within my_func
I would like to do some operation in my_var
that reads and then writes (such as +=
) atomically (in the sense that it must happen entirely after or before an interrupt - an interrupt cannot occur while it is going on).
What I would usually have then is something like this:
int my_var = 0;
void my_interrupt_handler(void)
{
// ...
my_var += 3;
// ...
}
int my_func(void)
{
// ...
INTENABLE = 0;
my_var += 5;
INTENABLE = 1;
// ...
}
If I'm understanding things correctly, if my_var was declared volatile
, then my_var
would be guaranteed to be "cleanly" updated (which is to say that the interrupt would not update my_var
inbetween my_func
's read and write of it) because the C standard guarantees that volatile memory accesses happen in order.
The part I would like some confirmation on is when it is not declared volatile
. Then, the compiler will not guarantee that the update happens with interrupts disabled, is that correct?
I'm wondering because I have written similar code (with non-volatile variables), with the difference that I disable interrupts through a function from another compilation unit (some library's file). If I am understanding things correctly, the likely actual reason that worked was that the compiler cannot assume the variable is not read or modified by calls outside the compilation unit. Therefore, if, say, I compiled with GCC's -flto
, reordering outside the critical region (bad things) could happen. Do I have this right?
EDIT:
Thanks to Lundin's comment I realized in my head I had mixed together the case where I disable a peripheral's interrupt register with the case where I use a specific assembly instruction to disable all interrupts on the processor.
I would imagine the instruction that enables/disables processor interrupts would prevent other instructions from being reordered from before to after or from after to before itself, but I still do not know for sure whether that is true.
EDIT 2:
Regarding volatile accesses: because I wasn't clear on whether reordering around volatile accesses was something not allowed by the standard, something that was allowed but didn't happen in practice, or something that was allowed and did happen in practice, I came up with a small test program:
volatile int my_volatile_var;
int my_non_volatile_var;
void my_func(void)
{
my_volatile_var = 1;
my_non_volatile_var += 2;
my_volatile_var = 0;
my_non_volatile_var += 2;
}
Using arm-none-eabi-gcc
version 7.3.1 to compile with -O2
for a Cortex-M0 (arm-none-eabi-gcc -O2 -mcpu=cortex-m0 -c example.c
) I get the following assembly:
movs r2, #1
movs r1, #0
ldr r3, [pc, #12] ; (14 <my_func+0x14>)
str r2, [r3, #0]
ldr r2, [pc, #12] ; (18 <my_func+0x18>)
str r1, [r3, #0]
ldr r3, [r2, #0]
adds r3, #4
str r3, [r2, #0]
bx lr
Where you can clearly see the two my_non_volatile_var += 2
were merged into a single instruction which happens after both volatile accesses. This means that GCC does indeed reorder when optimizing (and I'm going to go ahead and assume this means it is allowed by the standard).
C/C++ volatile has a very narrow range of guarantee uses: to interact with the outside world directly (signal handler written in C/C++ are "outside" when they are called asynchronously); that's why volatile object accesses are defined as observables, just like the console I/O and the exit value of the program (return value of main).
A way to see it is to imagine that any volatile access is actually translated by I/O on a special console, or terminal or pair of FIFO devices named Accesses and Values where:
x = v;
to object x of type T is translated to writing to the FIFO Accesses a write order specified as a 4-uplet ("write", T, &x, v)
x
is translated to writing to Accesses a 3-uplet ("read", T, &x)
and waiting for the value on Values.This way, volatile is exactly like an interactive console.
A nice specification of volatile is the ptrace semantic (that nobody but me uses, but it's still the nicest volatile specification ever):
It means that you have a well defined ptrace observable state of the volatile objects at these points, period.
(*) But you may not set a volatile object to an invalid bit pattern with ptrace: the compiler can assume that any object has a legal bit pattern as defined by the ABI. All uses of ptrace to access volatile state must follow the ABI specification of objects shared with separately compiled code. For example a compiler can assume that a volatile number object doesn't have a negative zero value if the ABI doesn't allow it. (Obviously negative zero is a valid state, semantically distinct from positive zero, for IEEE floats.)
(**) Inlining and loop unrolling can generate many points in assembly/binary code corresponding to a unique C/C++ point; debuggers handle that by setting many PC level breakpoints for one source level breakpoint.
ptrace semantic doesn't even imply that a volatile local variable is stored on the stack and not in register; it implies that the location of the variable, as described in the debugging data, is modifiable either in addressable memory via its stable address in the stack (stable for the duration of the function call obviously) or in the representation of the saved registers of a paused program, which is in temporary complete copy of the registers as saved by the scheduler when a thread of execution is paused.
[In practice all compilers provide a stronger guarantee than ptrace semantic: that all volatile objects have a stable address even if their address is never taken in C/C++ code; this guarantee is sometimes not useful and strictly pessimistic. The lighter ptrace semantic guarantee is extremely useful in itself for automatic variable in register in "high level assembly".]
You can't examine a running program (or thread) without stopping it; you cannot observe from any CPU without synchronization (ptrace provides such synchronization).
These guarantees hold at any optimization level. At minimum optimization, all variables are in fact practically volatile and the program can be stopped at any expression.
At higher optimization level, computations are reduced and variables can even be optimized out if they hold no useful information for any legal run; the most obvious case is a "quasi const" variable, which isn't declared const, but used a-if const: set once and never changed. Such variable carries no information at runtime if the expression that was used to set it can be recomputed later.
Many variables that carry useful information still have a limited range: if there is no expression in a program that can set a signed integer type to a mathematical negative result (a result that is truly negative, not negative because of overflow in 2-complement system), the compiler can assume that they don't have negative values. Any attempt to set these to a negative value in the debugger or via ptrace would be unsupported as the compiler can generate code that integrate the assumption; making the object volatile would force the compiler to allow any possible legal value for the object, even if only assignments of positive values are present in the complete code (the code in all paths that can access that object, in every TU (translation unit) that can access the object).
Note that for any object that is shared beyond the set of collectively translated code (all TU that are compiled and optimized together), nothing about the possible values of the object can be assumed beside the applicable ABI.
The trap (not trap as in computing) is to expect Java volatile-like semantic in at least single CPU, linear, ordered semantic programming (where there is by definition no out of order execution as there is only of POV on the state, the one and only CPU):
int *volatile p = 0;
p = new int(1);
There is no volatile guarantee that p
can only be null or point to an object with value 1: there is no volatile ordering implied between the initialization of the int
and the setting of the volatile object, so an async signal handler or a breakpoint on the volatile assignment may not see the int
initialized.
But the volatile pointer may not be modified speculatively: until the compiler obtains the guarantee that the rhs (right hand side) expression will not throw an exception (thus leave p
untouched), it cannot modify the volatile object (as a volatile access is an observable by definition).
Going back to your code:
INTENABLE = 0; // volatile write (A)
my_var += 5; // normal write
INTENABLE = 1; // volatile write (B)
Here INTENABLE
is volatile so all accesses are observable; the compiler must produce exactly those side effects; the normal writes are internal to the abstract machine and the compiler need only to preserve these side effects WRT to producing the correct result, without accounting for any signals which are outside the abstract semantics of C/C++.
In term of ptrace semantics, you can set a breakpoint at point (A) and (B) and observe or change the value of INTENABLE
but that's all. Although my_var
may not be optimized out completely as it accessible by outside code (the signal handing code) but there is nothing else in that function that can access it, so the concrete representation of my_var
doesn't have to match its the value according to the abstract machine at that point.
It's different if you have call to an truly external (not analyzable by the compiler, outside the "collectively translated code") do-nothing function in between:
INTENABLE = 0; // volatile write (A)
external_func_1(); // actual NOP be can access my_var
my_var += 5; // normal write
external_func_2(); // actual NOP be can access my_var
INTENABLE = 1; // volatile write (B)
Note that both of these calls to do-nothing-possibly-do-anything external functions are needed:
external_func_1()
possibly observes the previous value of my_var
external_func_2()
possibly observes the new value of my_var
These calls are to external, separately compiled NOP functions that have to be made according to the ABI; thus all globally accessible objects must carry the ABI representation of their abstract machine value: the objects must reach their canonical state, unlike the optimized state where the optimizer knows that some concrete memory representation of some objects have not reached the value of the abstract machine.
In GCC such do-nothing external function can be spelled either asm("" : : : "memory");
or just asm("");
. The "memory"
is vaguely specified but clearly means "accesses anything in memory whose address has been leaked globally".
[See here I'm relying on the transparent intent of the specification and not on its words as the words are very often badly chosen(#) and not used by anyone to build an implementation anyway, and only the opinion of people count, the words never do.
(#) at least in the world of common programming languages where people don't have the qualification to write formal or even correct specifications. ]