Search code examples
assemblyx86-64reverse-engineeringdisassemblyatt

What does call *(%rax, %rcx,8) mean in a loop? Why do RAX and RCX change afterwards?


I have this function pointer and this code:

0x0000555555556e80 <+0>:     push   %rbp
0x0000555555556e81 <+1>:     mov    %rsp,%rbp
0x0000555555556e84 <+4>:     sub    $0x10,%rsp
0x0000555555556e88 <+8>:     movl   $0x0,-0x4(%rbp)
0x0000555555556e8f <+15>:    movslq -0x4(%rbp),%rcx
0x0000555555556e93 <+19>:    lea    0x7406(%rip),%rax        # 0x55555555e2a0 <init_functions>
0x0000555555556e9a <+26>:    cmpq   $0x0,(%rax,%rcx,8)
0x0000555555556e9f <+31>:    je     0x555555556ec1 <initialize_bomb+65>
0x0000555555556ea5 <+37>:    movslq -0x4(%rbp),%rcx
0x0000555555556ea9 <+41>:    lea    0x73f0(%rip),%rax        # 0x55555555e2a0 <init_functions>
0x0000555555556eb0 <+48>:    call   *(%rax,%rcx,8)
0x0000555555556eb3 <+51>:    mov    -0x4(%rbp),%eax
0x0000555555556eb6 <+54>:    add    $0x1,%eax
0x0000555555556eb9 <+57>:    mov    %eax,-0x4(%rbp)
0x0000555555556ebc <+60>:    jmp    0x555555556e8f <initialize_bomb+15>
0x0000555555556ec1 <+65>:    add    $0x10,%rsp
0x0000555555556ec5 <+69>:    pop    %rbp
0x0000555555556ec6 <+70>:    ret    

It's a loop that iterates 11 times, and I really don't know how to handle the function pointer. When there's the call *(%rax,%rcx,8), the two registers involved (RAX and RCX) change but I don't understand how or why, since I don't know what happens in that call...

I can't put breakpoints. I don't know what to do.


Solution

  • call *address loads a function pointer from memory into RIP, using standard AT&T syntax for an addressing mode (or register name) following the *. See What does an asterisk * before an address mean in x86-64 AT&T assembly? So this pushes a return address then loads a new RIP from the address [rax + rcx*8].

    The call *foo syntax (EIP/RIP = dword/qword loaded from memory at foo, memory-indirect) has an asterisk to disambiguate from call foo (RIP = address of foo, a direct call rell32), in case you were using just a bare symbol name as the addressing mode.

    In 64-bit mode you'd normally use call *foo(%rip) for a static function pointer that wasn't in an array, but AT&T syntax was designed long before x86-64 existed, and 64-bit mode would still have that ambiguity. (In all other cases, GAS will warn if you leave out the *, and infer that you meant an indirect jump/call if you write something like call (%rax) or call %rax.)


    RAX and RCX are call-clobbered so it's normal they don't keep their value; notice how their values before call come from an LEA and a load from a local on the stack. (What registers are preserved through a linux x86-64 function call)

    If you want to see what functions are called, use GDB stepi (aka si) to single-step into the call. (Put a breakpoint somewhere in this function so you can single step from there.)


    If you want to understand the loop, look at the code surrounding the call.

    A RIP-relative LEA puts a constant address into RAX; as fjs points out, there's a symbol name init_functions.

    RCX is loaded from a local var on the stack, with sign-extension from 32-bit to 64. Looking at the surrounding code, this is clearly a loop counter, initialized to zero earlier in the function. Presumably an int.

    Before the call, the same array indexing is done to check if it's a NULL pointer. This is clearly debug-mode compiler output, where each C statement is compiled to a separate block of asm. That means you only have to look locally to see what a block is doing in isolation, but it leads to much more code than would be necessary, e.g. two accesses to the array and redoing sign-extension of the loop counter each time.

    Some things like this are simple enough that the whole loop is easy to follow in an optimized build. Well, easy enough; GCC does rotate the loop so the condition can be at the bottom, partially peeling it. And checks the first condition before saving RBX (shrink-wrap optimization) which it uses to hold a pointer into the array. (Instead of using a pointer and separate integer index).

    extern void (*init_functions[])();
    
    void init(){
        for(int i=0 ; init_functions[i]  ; i++) 
            init_functions[i]();
    }
    

    Godbolt

    init():
            movq    init_functions(%rip), %rax   # partially peeled first iteration
            testq   %rax, %rax
            je      .L9
            pushq   %rbx
            leaq    8+init_functions(%rip), %rbx      # fptr = &init_functions[1]
              # enter the loop with RAX holding first array entry
    .L3:                         # do{
            call    *%rax
            movq    (%rbx), %rax    # load the next 
            addq    $8, %rbx        # fptr++
            testq   %rax, %rax      # and test it
            jne     .L3          # }while( *fptr != 0 )
            popq    %rbx
            ret
    .L9:
            ret         # silly compiler, no need for tail duplication here.