Search code examples
assemblygccclanginline-assemblyarm64

How to print the register number with gcc-style inline assembly?


Inspired by a recent question.

One use case for gcc-style inline assembly is to encode instructions neither compiler nor assembler are aware of. For example, I gave this example for how to use the rdrand instruction on a toolchain too old to support it:

/* "rdrand %%rax ; setc %b1" */
asm volatile (".byte 0x48, 0x0f, 0xc7, 0xf0; setc %b1"
    : "=a"(result), "=qm"(success) :: "cc");

Unfortunately, hard-coding the instruction means that you also need to hard-code the registers used with it, greatly reducing the compiler's freedom to perform register allocation.

On some architectures (like RISC-V with its .insn directive) the assembler provides a way to systematically build original instructions, but that seems to be the exception.

A simple solution would be to have a way to obtain the undecorated number of the register to manually encode it into the instruction. For example, suppose a template modifier X existed to print the number of the register chosen. Then, the above example could be made more flexible as such:

/* "rdrand %0 ; setc %b1" */
asm volatile (".byte 0x48 | (%X0 >> 3), 0x0f, 0xc7, 0xf0 | (%X0 & 7); setc %b1"
    : "=r"(result), "=qm"(success) :: "cc");

Similarly, if there was a way to have gcc print 12 instead of v12 for SIMD register 12 on ARM64, it would be possible to do stuff like this:

float32x4_t add3(float32x4_t a, float32x4_t b)
{
    float32x4_t c;

    /* fadd %0, %1, %2 */
    asm (".inst 0x4e20d40 + %X0 + (%X1<<5) + (%X2<<16)" : "=w"(c) : "w"(a), "w"(b));

    return c;
}

Is there a way to obtain the register number? If no, what other options exist to encode instructions neither compiler nor assembler are aware of without having to hard-code register numbers?


Solution

  • I've actually had the same problem and came up with the following solution.

    #define REG_CONST(n) asm(".equ .L__reg_const__v" #n ", " #n);
    
    REG_CONST(0)
    REG_CONST(1)
    REG_CONST(2)
    REG_CONST(3)
    // ... repeat this for all register numbers ...
    REG_CONST(27)
    REG_CONST(28)
    REG_CONST(29)
    REG_CONST(30)
    
    float32x4_t add3(float32x4_t a, float32x4_t b) {
        float32x4_t c;
        // fadd %0, %1, %2
        asm(".inst 0x4e20d40 | .L__reg_const__%0 | (.L__reg_const__%1 << 5) + (.L__reg_const__%2 << 16)" : "=w"(c) : "w"(a), "w"(b));
    
        return c;
    }
    

    how does this work?

    1. Keep in mind that the placeholder like %0, %1, ... will be filled with a register name through simple string replacements by the compiler before passing the result to the assembler.
    2. inside assembly files we can use the .equ directive to define symbols to represent integers. (symbols that start with .L will be not be visible in the generated object file, so we don't unnecessarily clutter the symbol table)
    3. each of the invocations of the REG_CONST macro will define a (local) symbol: .L__reg_const__v0 which will be equal to 0, .L__reg_const__v1 equal to 1, .L__reg_const__v2 to 2, and so on.
    4. the macros are intentionally placed at the top of the file, outside any function because the resulting asm(".equ .L__reg_const__v0 0") expression is supposed to go at the top of the assembly file.
    5. in the asm(".inst ...") template inside the add3 function the %0, %1, %2 will then be replaced with whatever register the compiler selected for a, b and c.
    6. since we sneakily wrote the placeholder without any space directly after the .L__reg_const__ expression, the replacement will turn it into expressions like .L__reg_const__v7.
    7. but this corresponds exactly to the name of the integer symbols we defined at the top! so the assembler will actually pick this up as a symbol and replace it with the integer value we defined.
    8. after evaluating the symbols, the result is a purely numeric expression and the assembler will happily "or" the integer values together, yielding the desired opcode.