Search code examples
carmmemory-alignmentarmcc

Elegant way to define an automatic variable with specific alignment


I am working with the ARM compiler and have a HW peripheral (having direct memory access) which is requiring a specific alignment for the memory buffers passed to it (32-byte alignment). This is not a problem when the buffers are global/static and can be defined using the aligned attribute the compiler is supporting. The problem is arising whenever there is a need to pass a buffer defined in some function locally, i.e. having an automatic storage class. I tried to do something similar to following:

typedef struct  __attribute__((aligned(32)))
{
    char bytes[32];
} aligned_t;

_Static_assert(sizeof(aligned_t)==32, "Bad size");

void foo(void)
{
    aligned_t alignedArray[NEEDED_SIZE/sizeof(aligned_t)];
    //.... use alignedArray
}

and this was happily compiled and working on x86 compiler. But not in armcc, which is complaining:

Warning: #1041-D: alignment for an auto object may not be larger than 8

So this approach does not work. There is another one, which I consider ugly:

void foo(void)
{
    char unalignedBuffer[NEEDED_SIZE + 32 - 1];
    char pAlignedBuffer = ALIGN_UP_32(unalignedBuffer);
    //.... use pAlignedBuffer
}

while the ALIGN_UP_32 is a macro to return the first aligned address within unalignedBuffer (implementation details are not important here I guess).

As I said, I don't like this approach and wondering if there is a more elegant way to achieve the same?


Solution

  • I am working with the ARM compiler

    Have you also tried a recent GCC (perhaps configured as a cross-compiler), e.g. GCC 8 in november 2018?

    The stack pointer (probably) is not guaranteed by the ARM ABI to be aligned to 32 bytes.

    So any automatic variable is not aligned as much as you want.

    You could avoid them (and systematically use suitably aligned heap memory zone). Or you could allocate more than what is needed and do pointer arithmetic on it.

    I feel that your char* pAlignedBuffer = ALIGN_UP_32(unalignedBuffer); is a good approach, and I would believe that an optimizing compiler would generate quite efficient code.

    I don't like this approach and wondering if there is a more elegant way to achieve the same?

    I believe your approach is good, and any other way would be equivalent.

    PS. Another approach might be to patch your GCC compiler (perhaps with a plugin) to change the default alignment of the stack pointer (hence effectively changing your ABI and calling conventions). That would take you weeks (or months) of efforts.