Search code examples
c++gccarminline-assemblycortex-m

What is the correct way to tell the compiler that I want a variable to be always stored in a register?


Reading the answers of this question it came to my attention that register is no longer a valid storage specifier in C++17. Some comments even suggest that the compiler had already ignored register for some time.

I use GCC 6.x with an ARM Cortex-M MCU and have some piece of code with inline assembly which absolutely need to have a variable in a register. Previously I had assumed that the register keyword will do this for me, but apparently it doesn't.

  • In modern C++, what is the correct way to ensure that the compiler always uses a register for a given variable?
  • If there isn't a standard way, is there a GCC-specific way of doing this? Maybe some sort of attribute? Or a compiler specific keyword?

EDIT: Why do I need to store something in a register?
I'm implemeting a lock-free ring buffer with the ARM LDREX / STREX instructions. I need to store the result of the ARM LDREX instruction in a register, because storing it in memory would defeat the whole mechanism on a Cortex-M.

EDIT: Example code.

This is a code snippet cut from the ring buffer to illustrate the point of the question. The points of interest are __LDREXW, __STREXW and __CLREX which are all defined in cmsis_gcc.h. They are intrinsic functions of the ARM synchronization primitives. I use them to implement a lock-free mechanism.

template<typename T, uint32_t maxCount>
class RingBuffer final {

    __attribute__((aligned(8)))
    T buffer[maxCount];
    uint32_t start;
    uint32_t end;

    bool pushBack(const T &item) {
        register uint32_t exclusiveEnd;
        register uint32_t oldEnd;

        do {
            // Load current end value exclusively
            exclusiveEnd = __LDREXW(&end);
            __DMB();

            // Remember old end value so that
            // we can store the item at that location
            oldEnd = exclusiveEnd;

            // Check if ring buffer is full
            if (isFull()) {
                __CLREX();
                __DMB();
                return false;
            }

            // Figure out correct new value
            if (exclusiveEnd == (maxCount - 1)) {
                exclusiveEnd = 0;
            }
            else {
                exclusiveEnd ++;
            }

            // Attempt to store new end value
        } while (0 != __STREXW(exclusiveEnd, &end));
        __CLREX();
        __DMB();

        // Store new item
        //memcpy(buffer + oldEnd, &item, sizeof(T));
        buffer[oldEnd] = item;
        return true;
    }

    // ... other methods ...

}

Why the LDREX result must be stored in a register:

On the Cortex-M4 the implemented exclusives reservation granule is the entire memory address range (quoted from Cortex-M4 TRM), which means if the variable storing the LDREX result ends up in memory instead of a register, then the following STREX will always fail.

NOTE: this code runs on "bare-metal" hardware, there is no operating system, etc.


Solution

  • What is the correct way to tell the compiler that I want a variable to be always stored in a register?

    You can't do that (in portable standard C++ or C code). You need to trust your compiler, so you should not even want to do that.

    Notice that:

    • Recent C & C++ standards (e.g. C11 or C++14 or C++17) don't speak of processor registers in an imperative way, and they mention that the register keyword was (in the previous century) only a hint for compilers.

    • Some processors don't (at least in the past) even have any real programmer accessible processor register.

    • Most importantly, you should trust your compiler for good enough optimizations and in some cases putting a value in a register is not the best for performance (in particular, because that register could be better used for some other value).

    However, as an extension the GCC compiler enables you to put a variable in a specified register. I don't recommend using that without very good reasons (at least be sure to benchmark your code with and without using that feature).

    You really need to understand that current compilers are most of the time optimizing better than you can do. Be sure to benchmark your code (e.g. compiled with g++ -O3 and appropriate -mtune= argument) before trying to optimize by hand. For performance sensitive routines, examine also the generated assembler code (e.g. with g++ -O3 -fverbose-asm -S).

    On the Cortex-M4 the implemented exclusives reservation granule is the entire memory address range (quoted from Cortex-M4 TRM),

    Then I recommend either using a small extended assembler code (for GCC) or, if absolutely necessary, declare a variable in a specified register

    Perhaps you also need to compile all your code (including any used library, including standard C and C++ libraries!) with -ffixed-reg option.

    But I insist: you need to trust your compiler more than you currently do. Are you sure you can't find (and perhaps configure and build from source) a recent GCC (e.g. GCC 7) which enables, as a builtin or something else, your low-level synchronization mechanism?