I am looking for a way to automate gcc inline assembly calls through template functions.
For example I have the following dummy function to store a value into a pointer. For now I specialize the template function for different types. Every time something changes in the code, I need to change it for every specialization.
template <typename T>
void store_ptr(T *location, T value);
template <>
void store_ptr<char>(char *location, char value) {
__asm__ __volatile__(
"strb %1, [%0]\n\t"
: "+r" (location)
: "r" (value)
: "memory"
);
}
template <>
void store_ptr<short>(short *location, short value) {
__asm__ __volatile__(
"strh %1, [%0]\n\t"
: "+r" (location)
: "r" (value)
: "memory"
);
}
It would be nice if the template could stringify the instruction appendix ("b", "h" ...) depending on the template type.
template <typename T>
void store_ptr<T>(T *location, T value) {
__asm__ __volatile__(
"str" stringify_template_type(T) " %1, [%0]\n\t"
: "+r" (location)
: "r" (value)
: "memory"
);
}
Is there a (simple) way of achieving this?
You can't do it with the preprocessor and sizeof
; the template string has to be an actual string literal, not a constant-expression involving sizeof
and ternaries or anything like that. And sizeof(foo)
isn't available as an integer to the preprocessor for its conditionals, even without templates.
I don't know how to do this for ARM, but I think the only plausible way is to use special stuff inside the template string that GCC knows how to expand to to a suffix. x86's %z0
doesn't work for ARM, I tried. But I didn't check the GCC source code for the list of modifiers.
For x86 AT&T syntax, this is possible with the z
modifier to print the operand-size suffix corresponding to the type of the operand. e.g. %z0
is b
if the first operand is a char
. While regular %0
might expand to %al
, a byte register.
You'd use it like this:
template <typename T>
void store_ptr<T>(T *location, T value) {
__asm__ __volatile__(
"mov%z0 %1, %0"
: "=m" (*location) // let the compiler pick an addressing mode
: "re" (value) // register or up-to-imm32 source
: // "memory" // the compiler knows that *location is the only memory written
);
}
e.g.
void test(void *dst, int x) {
store_ptr((int*)dst, x);
store_ptr((short*)dst, short(x));
store_ptr((char*)dst, char(-123));
store_ptr((long*)dst, long(-123));
store_ptr((long*)dst, long(0x00000000ffffffff)); // doesn't fit in sign-extended imm32
store_ptr((unsigned*)dst, unsigned(0xffffffff)); // does fit, uses an immediate
store_ptr((short*)dst, short(123));
}
This compiles correctly on Godbolt, with x86-64 GCC for AT&T syntax. (But not Clang 17; it doesn't seem to understand the %z0
modifier.) I checked the "link to binary" or "compile to binary object" to verify that this compiler-generated asm also assembles; GCC doesn't verify that, it just does string substitution into the assembler template, like a printf format string.
test(void*, int): # pointer in RDI, int in ESI
movl %esi, (%rdi) # AT&T is op src, dst opposite of ARM insn other than str
movw %si, (%rdi)
movb $-123, (%rdi)
movq $-123, (%rdi) # mov $sign_extended_imm32, m64
movl $4294967295, %eax # put 0x00000000ffffffff into RAX via mov $-1, %eax
movq %rax, (%rdi) # this is the actual inline asm template, the previous instruction was generated to set up the "r" operand
movl $-1, (%rdi) # unsigned(0xffffffff) does match an "e" constraint
movw $123, (%rdi)
ret
I could have done stuff like (long*)dst+x
or dst+8
to get other addressing modes as well, e.g. (%rdi,%rsi,8)
after sign-extending ESI to RSI.