Search code examples
cgccx86inline-assemblycpu-registers

Why can't I get the value of asm registers in C?


I'm trying to get the values of the assembly registers rdi, rsi, rdx, rcx, r8, but I'm getting the wrong value, so I don't know if what I'm doing is taking those values or telling the compiler to write on these registers, and if that's the case how could I achieve what I'm trying to do (Put the value of assembly registers in C variables)?

When this code compiles (with gcc -S test.c)

#include <stdio.h>

void    beautiful_function(int a, int b, int c, int d, int e) {
    register long   rdi asm("rdi");
    register long   rsi asm("rsi");
    register long   rdx asm("rdx");
    register long   rcx asm("rcx");
    register long   r8 asm("r8");

    const long      save_rdi = rdi;
    const long      save_rsi = rsi;
    const long      save_rdx = rdx;
    const long      save_rcx = rcx;
    const long      save_r8 = r8;
    printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}

int main(void) {
    beautiful_function(1, 2, 3, 4, 5);
}

it outputs the following assembly code (before the function call):

    movl    $1, %edi
    movl    $2, %esi
    movl    $3, %edx
    movl    $4, %ecx
    movl    $5, %r8d
    callq   _beautiful_function

When I compile and execute it outputs this:

0
0
4294967296
140732705630496
140732705630520
(some undefined values)

What did I do wrong ? and how could I do this?


Solution

  • Even if you had a valid way of doing this (which this isn't), it probably only makes sense at the top of a function which isn't inlined. So you'd probably need __attribute__((noinline, noclone)). (noclone is a GCC attribute that clang will warn about not recognizing; it means not to make an alternate version of the function with fewer actual args, to be called in the case where some of them are known constants that can get propagated into the clone.)

    register ... asm local vars aren't guaranteed to do anything except when used as operands to Extended Asm statements. GCC does sometimes still read the named register if you leave it uninitialized, but clang doesn't. (And it looks like you're on a Mac, where the gcc command is actually clang, because so many build scripts use gcc instead of cc.)

    So even without optimization, the stand-alone non-inlined version of your beautiful_function is just reading uninitialized stack space when it reads your rdi C variable in const long save_rdi = rdi;. (GCC does happen to do what you wanted here, even at -Os - optimizes but chooses not to inline your function. See clang and GCC (targeting Linux) on Godbolt, with asm + program output.).


    Using an asm statement to make register asm do something

    (This does what you say you want (reading registers), but because of other optimizations, still doesn't produce 1 2 3 4 5 with clang when the caller can see the definition. Only with actual GCC. There might be a clang option to disable some relevant IPA / IPO optimization, but I didn't find one.)

    You can use an asm volatile() statement with an empty template string to tell the compiler that the values in those registers are now the values of those C variables. (The register ... asm declarations force it to pick the right register for the right variable)

    #include <stdlib.h> 
    #include <stdio.h>
    
    __attribute__((noinline,noclone))
    void    beautiful_function(int a, int b, int c, int d, int e) {
        register long   rdi asm("rdi");
        register long   rsi asm("rsi");
        register long   rdx asm("rdx");
        register long   rcx asm("rcx");
        register long   r8 asm("r8");
    
        // "activate" the register-asm locals:
        // associate register values with C vars here, at this point
       asm volatile("nop  # asm statement here"        // can be empty, nop is just because Godbolt filters asm comments
           : "=r"(rdi), "=r"(rsi), "=r"(rdx), "=r"(rcx), "=r"(r8) );
    
        const long      save_rdi = rdi;
        const long      save_rsi = rsi;
        const long      save_rdx = rdx;
        const long      save_rcx = rcx;
        const long      save_r8 = r8;
        printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
    }
    
    int main(void) {
        beautiful_function(1, 2, 3, 4, 5);
    }
    

    This makes asm in your beautiful_function that does capture the incoming values of your registers. (It doesn't inline, and the compiler happens not to have used any instructions before the asm statement that steps on any of those registers. The latter is not guaranteed in general.)

    On Godbolt with clang -O3 and gcc -O3

    gcc -O3 does actually work, printing what you expect. clang still prints garbage, because the caller sees that the args are unused, and decides not to set those registers. (If you'd hidden the definition from the caller, e.g. in another file without LTO, that wouldn't happen.)

    (With GCC, noninline,noclone attributes are enough to disable this inter-procedural optimization, but not with clang. Not even compiling with -fPIC makes that possible. I guess the idea is that symbol-interposition to provide an alternate definition of beautiful_function that does use its args would violate the one definition rule in C. So if clang can see a definition for a function, it assumes that's how the function works, even if it isn't allowed to actually inline it.)

    With clang:

    main:
            pushq   %rax          # align the stack
         # arg-passing optimized away
            callq   beautiful_function@PLT
        # indirect through the PLT because I compiled for Linux with -fPIC, 
        # and the function isn't "static"
            xorl    %eax, %eax
            popq    %rcx
            retq
    

    But the actual definition for beautiful_function does exactly what you want:

    # clang -O3
    beautiful_function:
            pushq   %r14
            pushq   %rbx
            nop     # asm statement here
            movq    %rdi, %r9             # copying all 5 register outputs to different regs
            movq    %rsi, %r10
            movq    %rdx, %r11
            movq    %rcx, %rbx
            movq    %r8, %r14
            leaq    .L.str(%rip), %rdi
            xorl    %eax, %eax
            movq    %r9, %rsi                # then copying them to printf args
            movq    %r10, %rdx
            movq    %r11, %rcx
            movq    %rbx, %r8
            movq    %r14, %r9
            popq    %rbx
            popq    %r14
            jmp     printf@PLT              # TAILCALL
    

    GCC wastes fewer instructions, just for example starting with movq %r8, %r9 to move your r8 C var as the 6th arg to printf. Then movq %rcx, %r8 to set up the 5th arg, overwriting one of the output registers before it's read all of them. Something clang was over-cautious about. However, clang does still push/pop %r12 around the asm statement; I don't understand why. It ends by tailcalling printf, so it wasn't for alignment.


    Related: