Search code examples
c++c++11templatesinline-assembly

Stringify template type for inline assembler


I am looking for a way to automate gcc inline assembly calls through template functions.
For example I have the following dummy function to store a value into a pointer. For now I specialize the template function for different types. Every time something changes in the code, I need to change it for every specialization.

template <typename T>
void store_ptr(T *location, T value);

template <>
void store_ptr<char>(char *location, char value) {
  __asm__ __volatile__(
      "strb %1, [%0]\n\t"
      : "+r" (location)
      : "r" (value)
      : "memory"
  );
}


template <>
void store_ptr<short>(short *location, short value) {
  __asm__ __volatile__(
      "strh %1, [%0]\n\t"
      : "+r" (location)
      : "r" (value)
      : "memory"
  );
}

It would be nice if the template could stringify the instruction appendix ("b", "h" ...) depending on the template type.

template <typename T>
void store_ptr<T>(T *location, T value) {
  __asm__ __volatile__(
     "str" stringify_template_type(T) " %1, [%0]\n\t"
     : "+r" (location)
     : "r" (value)
     : "memory"
     );
}

Is there a (simple) way of achieving this?


Solution

  • You can't do it with the preprocessor and sizeof; the template string has to be an actual string literal, not a constant-expression involving sizeof and ternaries or anything like that. And sizeof(foo) isn't available as an integer to the preprocessor for its conditionals, even without templates.

    I don't know how to do this for ARM, but I think the only plausible way is to use special stuff inside the template string that GCC knows how to expand to to a suffix. x86's %z0 doesn't work for ARM, I tried. But I didn't check the GCC source code for the list of modifiers.


    For x86 AT&T syntax, this is possible with the z modifier to print the operand-size suffix corresponding to the type of the operand. e.g. %z0 is b if the first operand is a char. While regular %0 might expand to %al, a byte register.

    You'd use it like this:

    template <typename T>
    void store_ptr<T>(T *location, T value) {
      __asm__ __volatile__(
         "mov%z0   %1, %0"
         : "=m" (*location)  // let the compiler pick an addressing mode
         : "re" (value)      // register or up-to-imm32 source
         :  // "memory"  // the compiler knows that *location is the only memory written
         );
    }
    

    e.g.

    void test(void *dst, int x) {
       store_ptr((int*)dst, x);
       store_ptr((short*)dst, short(x));
       store_ptr((char*)dst, char(-123));
       store_ptr((long*)dst, long(-123));
       store_ptr((long*)dst, long(0x00000000ffffffff));  // doesn't fit in sign-extended imm32
       store_ptr((unsigned*)dst, unsigned(0xffffffff));  // does fit, uses an immediate
       store_ptr((short*)dst, short(123));
    }
    

    This compiles correctly on Godbolt, with x86-64 GCC for AT&T syntax. (But not Clang 17; it doesn't seem to understand the %z0 modifier.) I checked the "link to binary" or "compile to binary object" to verify that this compiler-generated asm also assembles; GCC doesn't verify that, it just does string substitution into the assembler template, like a printf format string.

    test(void*, int):               # pointer in RDI, int in ESI
            movl   %esi, (%rdi)     # AT&T is   op  src, dst    opposite of ARM insn other than str
            movw   %si, (%rdi)
            movb   $-123, (%rdi)
            movq   $-123, (%rdi)         # mov $sign_extended_imm32, m64
    
            movl    $4294967295, %eax    # put 0x00000000ffffffff into RAX via mov $-1, %eax
            movq   %rax, (%rdi)          # this is the actual inline asm template, the previous instruction was generated to set up the "r" operand
    
            movl   $-1, (%rdi)          # unsigned(0xffffffff) does match an "e" constraint
            movw   $123, (%rdi)
            ret
    

    I could have done stuff like (long*)dst+x or dst+8 to get other addressing modes as well, e.g. (%rdi,%rsi,8) after sign-extending ESI to RSI.