Reading the answers of this question it came to my attention that register
is no longer a valid storage specifier in C++17. Some comments even suggest that the compiler had already ignored register
for some time.
I use GCC 6.x with an ARM Cortex-M MCU and have some piece of code with inline assembly which absolutely need to have a variable in a register. Previously I had assumed that the register
keyword will do this for me, but apparently it doesn't.
EDIT: Why do I need to store something in a register?
I'm implemeting a lock-free ring buffer with the ARM LDREX
/ STREX
instructions. I need to store the result of the ARM LDREX
instruction in a register, because storing it in memory would defeat the whole mechanism on a Cortex-M.
EDIT: Example code.
This is a code snippet cut from the ring buffer to illustrate the point of the question. The points of interest are __LDREXW
, __STREXW
and __CLREX
which are all defined in cmsis_gcc.h
. They are intrinsic functions of the ARM synchronization primitives. I use them to implement a lock-free mechanism.
template<typename T, uint32_t maxCount>
class RingBuffer final {
__attribute__((aligned(8)))
T buffer[maxCount];
uint32_t start;
uint32_t end;
bool pushBack(const T &item) {
register uint32_t exclusiveEnd;
register uint32_t oldEnd;
do {
// Load current end value exclusively
exclusiveEnd = __LDREXW(&end);
__DMB();
// Remember old end value so that
// we can store the item at that location
oldEnd = exclusiveEnd;
// Check if ring buffer is full
if (isFull()) {
__CLREX();
__DMB();
return false;
}
// Figure out correct new value
if (exclusiveEnd == (maxCount - 1)) {
exclusiveEnd = 0;
}
else {
exclusiveEnd ++;
}
// Attempt to store new end value
} while (0 != __STREXW(exclusiveEnd, &end));
__CLREX();
__DMB();
// Store new item
//memcpy(buffer + oldEnd, &item, sizeof(T));
buffer[oldEnd] = item;
return true;
}
// ... other methods ...
}
Why the LDREX
result must be stored in a register:
On the Cortex-M4 the implemented exclusives reservation granule is the entire memory address range (quoted from Cortex-M4 TRM), which means if the variable storing the LDREX
result ends up in memory instead of a register, then the following STREX
will always fail.
NOTE: this code runs on "bare-metal" hardware, there is no operating system, etc.
What is the correct way to tell the compiler that I want a variable to be always stored in a register?
You can't do that (in portable standard C++ or C code). You need to trust your compiler, so you should not even want to do that.
Notice that:
Recent C & C++ standards (e.g. C11 or C++14 or C++17) don't speak of processor registers in an imperative way, and they mention that the register
keyword was (in the previous century) only a hint for compilers.
Some processors don't (at least in the past) even have any real programmer accessible processor register.
Most importantly, you should trust your compiler for good enough optimizations and in some cases putting a value in a register is not the best for performance (in particular, because that register could be better used for some other value).
However, as an extension the GCC compiler enables you to put a variable in a specified register. I don't recommend using that without very good reasons (at least be sure to benchmark your code with and without using that feature).
You really need to understand that current compilers are most of the time optimizing better than you can do. Be sure to benchmark your code (e.g. compiled with g++ -O3
and appropriate -mtune=
argument) before trying to optimize by hand. For performance sensitive routines, examine also the generated assembler code (e.g. with g++ -O3 -fverbose-asm -S
).
On the Cortex-M4 the implemented exclusives reservation granule is the entire memory address range (quoted from Cortex-M4 TRM),
Then I recommend either using a small extended assembler code (for GCC) or, if absolutely necessary, declare a variable in a specified register
Perhaps you also need to compile all your code (including any used library, including standard C and C++ libraries!) with -ffixed-
reg option.
But I insist: you need to trust your compiler more than you currently do. Are you sure you can't find (and perhaps configure and build from source) a recent GCC (e.g. GCC 7) which enables, as a builtin or something else, your low-level synchronization mechanism?