Search code examples
gccarmatomicinline-assembly

GCC 32 bit arm inline assembly constraints for atomically loading/storing register pairs


In 32 bit ARM assembly are several instructions available for atomically loading and storing a pair of registers:

  • ldaexd and stlexd (for ARMv8 32 bit with acquire-release memory order) [https://developer.arm.com/documentation/dui0802/b/A32-and-T32-Instructions/LDAEX-and-STLEX ]
  • ldrexd and strexd (for ARMv7 without included barriers) [https://developer.arm.com/documentation/dui0802/b/A32-and-T32-Instructions/LDREX-and-STREX ]

These 32 bit instructions have some requirements for choosing the transfer register pair (Rt and Rt2):

  • "Rt must be an even numbered register, and not LR"
  • "Rt2 must be R(t+1)"

I have included some example GCC inline assembly code (for C/C++, the described problem below is the same for all 4 instructions). This code does not fulfill the required register numbering.

inline static void atomic_exclusive_load_pair_aquire(uint32_t atomic[2], uint32_t target[2])
{
    asm volatile("ldaexd %0, %1, [%2]"  // load-acquire exclusive register pair
                 : "=r"(target[0]),     // first transfer register
                   "=r"(target[1])      // second transfer register
                 : "r"(&atomic[0])      // atomic base register
                 : "memory");           // "memory" acts as compiler r/w barrier
}

I would expect that GCC arm inline assembly constraints somehow might be able to describe depending register pairs for automatic register mapping, if this is required by single instructions.

My question is, how can the requirements for the two transfer registers be described as GCC inline assembly constraints to automatically choose the correct register numbers? Is this possible at all? May using "multiple alternative constraints" be a possible solution ([https://gcc.gnu.org/onlinedocs/gcc/Multi-Alternative.html ])?

Solution:

As amonakov and others wrote, the solution is to use uint64_t as transfer type, which uses a register pair on ARM 32 bit. Depending on Thumb is disabled, the register pair will be an even/odd pair. There are also more or less undocumented inline assembler constraints for accessing the pair registers.

inline static void atomic_exclusive_load_pair_aquire(uint32_t atomic[2], uint32_t transfer[2])
{
    uint64_t pair;
    asm volatile("ldaexd %Q[pair], %R[pair], [%[addr]]"  // load-acquire exclusive register pair
                 : [pair] "=r"(pair)       // transfer register pair
                 : [addr] "r"(&atomic[0])  // atomic base register
                 :        "memory");       // "memory" acts as compiler r/w barrier

    transfer[0] = static_cast<uint32_t>(pair);
    transfer[1] = static_cast<uint32_t>(pair >> 32);
}

Please see a full solution with assembly code on godbolt.


Solution

  • For such under-documented or even undocumented things you can "peek under the hood" and see how GCC internally describes these instructions in config/arm/sync.md.

    It turns out, binding a DImode (64-bit) operand is sufficient to get an even-odd register pair. In C, you can bind an uint64_t variable and use the H modifier to spell out the second register (the modifiers for Arm are not documented on GCC side, but LLVM documents them):

    uint64_t f(uint64_t *p)
    {
        uint64_t r;
        asm volatile("ldaexd %0, %H0, [%1]"
                     : "=r"(r)
                     : "r"(p)
                     : "memory");
        return r;
    }