In 32 bit ARM assembly are several instructions available for atomically loading and storing a pair of registers:
These 32 bit instructions have some requirements for choosing the transfer register pair (Rt and Rt2):
I have included some example GCC inline assembly code (for C/C++, the described problem below is the same for all 4 instructions). This code does not fulfill the required register numbering.
inline static void atomic_exclusive_load_pair_aquire(uint32_t atomic[2], uint32_t target[2])
{
asm volatile("ldaexd %0, %1, [%2]" // load-acquire exclusive register pair
: "=r"(target[0]), // first transfer register
"=r"(target[1]) // second transfer register
: "r"(&atomic[0]) // atomic base register
: "memory"); // "memory" acts as compiler r/w barrier
}
I would expect that GCC arm inline assembly constraints somehow might be able to describe depending register pairs for automatic register mapping, if this is required by single instructions.
My question is, how can the requirements for the two transfer registers be described as GCC inline assembly constraints to automatically choose the correct register numbers? Is this possible at all? May using "multiple alternative constraints" be a possible solution ([https://gcc.gnu.org/onlinedocs/gcc/Multi-Alternative.html ])?
Solution:
As amonakov and others wrote, the solution is to use uint64_t as transfer type, which uses a register pair on ARM 32 bit. Depending on Thumb is disabled, the register pair will be an even/odd pair. There are also more or less undocumented inline assembler constraints for accessing the pair registers.
inline static void atomic_exclusive_load_pair_aquire(uint32_t atomic[2], uint32_t transfer[2])
{
uint64_t pair;
asm volatile("ldaexd %Q[pair], %R[pair], [%[addr]]" // load-acquire exclusive register pair
: [pair] "=r"(pair) // transfer register pair
: [addr] "r"(&atomic[0]) // atomic base register
: "memory"); // "memory" acts as compiler r/w barrier
transfer[0] = static_cast<uint32_t>(pair);
transfer[1] = static_cast<uint32_t>(pair >> 32);
}
Please see a full solution with assembly code on godbolt.
For such under-documented or even undocumented things you can "peek under the hood" and see how GCC internally describes these instructions in config/arm/sync.md.
It turns out, binding a DImode (64-bit) operand is sufficient to get an even-odd register pair. In C, you can bind an uint64_t
variable and use the H
modifier to spell out the second register (the modifiers for Arm are not documented on GCC side, but LLVM documents them):
uint64_t f(uint64_t *p)
{
uint64_t r;
asm volatile("ldaexd %0, %H0, [%1]"
: "=r"(r)
: "r"(p)
: "memory");
return r;
}