Search code examples
cperformanceglobal-variablesc99restrict

C99: compiler optimizations when accessing global variables and aliased memory pointers


I'm writing C code for an embedded system. In this system, there are memory mapped registers at some fixed address in the memory map, and of course some RAM where my data segment / heap is.

I'm finding problems generating optimal code when my code is intermixing accesses to global variables in the data segment and accesses to hardware registers. This is a simplified snippet:

#include <stdint.h>

uint32_t * const restrict HWREGS = 0x20000;

struct {
    uint32_t a, b;
} Context;

void example(void) {
    Context.a = 123;
    HWREGS[0x1234] = 5;
    Context.b = Context.a;
}

This is the code generated on x86 (see also on godbolt):

example:
        mov     DWORD PTR Context[rip], 123
        mov     DWORD PTR ds:149712, 5
        mov     eax, DWORD PTR Context[rip]
        mov     DWORD PTR Context[rip+4], eax
        ret

As you can see, after having written the hardware register, Context.a is reloaded from RAM before being stored into Context.b. This doesn't make sense because Context is at a different memory address than HWREGS. In other words, the memory pointed by HWREGS and the memory pointed by &Context do not alias, but it looks like there is not way to tell that to the compiler.

If I change HWREGS definition as this:

extern uint32_t * const restrict HWREGS;

that is, I hide the fixed memory address to the compiler, I get this:

example:
        mov     rax, QWORD PTR HWREGS[rip]
        mov     DWORD PTR [rax+18640], 5
        movabs  rax, 528280977531
        mov     QWORD PTR Context[rip], rax
        ret
Context:
        .zero   8

Now the two writes to Context are optimized (even coalesced to a single write), but on the other hand the access to the hardware register does not happen anymore with a direct memory access but it goes through a pointer indirection.

Is there a way to obtain optimal code here? I would like GCC to know that HWREGS is at a fixed memory address and at the same time to tell it that it does not alias Context.


Solution

  • If you want to avoid compilers reloading regularly values from a memory region (possibly due to aliasing), then the best is not to use global variables, or at least not to use direct accesses to global variables. The register keyword seems ignored for global variables (especially here on HWREGS) for both GCC and Clang. Using the restrict keyword on function parameters solves this problem:

    #include <stdint.h>
    
    uint32_t * const HWREGS = 0x20000;
    
    struct Context {
        uint32_t a, b;
    } context;
    
    static inline void exampleWithLocals(uint32_t* restrict localRegs, struct Context* restrict localContext) {
        localContext->a = 123;
        localRegs[0x1234] = 5;
        localContext->b = localContext->a;
    }
    
    void example() {
        exampleWithLocals(HWREGS, &context);
    }
    

    Here is the result (see also on godbolt):

    example:
            movabs  rax, 528280977531
            mov     DWORD PTR ds:149712, 5
            mov     QWORD PTR context[rip], rax
            ret
    context:
            .zero   8
    

    Please note that the strict aliasing rule do not help in this case since the type of read/written variables/fields is always uint32_t.

    Besides this, based on its name, the variable HWREGS looks like a hardware register. Please note that it should be put volatile so that compiler do not keep it to registers nor perform any similar optimization (like assuming the pointed value is left unchanged if the code do not change it).