Search code examples
cassemblyx86reverse-engineeringatt

Trying to reverse engineer a function


I'm trying to understand assembly in x86 more. I have a mystery function here that I know returns an int and takes an int argument. So it looks like int mystery(int n){}. I can't figure out the function in C however. The assembly is:

mov  %edi, %eax
lea  0x0(,%rdi, 8), %edi
sub  %eax, %edi
add  $0x4, %edi
callq < mystery _util >
repz retq

< mystery _util >
mov  %edi, %eax
shr  %eax
and  $0x1, %edi
and  %edi, %eax
retq

I don't understand what the lea does here and what kind of function it could be.


Solution

  • The assembly code appeared to be computer generated, and something that was probably compiled by GCC since there is a repz retq after an unconditional branch (call). There is also an indication that because there isn't a tail call (jmp) instead of a call when going to mystery_util that the code was compiled with -O1 (higher optimization levels would likely inline the function which didn't happen here). The lack of frame pointers and extra load/stores indicated that it isn't compiled with -O0

    Multiplying x by 7 is the same as multiplying x by 8 and subtracting x. That is what the following code is doing:

    lea  0x0(,%rdi, 8), %edi
    sub  %eax, %edi
    

    LEA can compute addresses but it can be used for simple arithmetic as well. The syntax for a memory operand is displacement(base, index, scale). Scale can be 1, 2, 4, 8. The computation is displacement + base + index * scale. In your case lea 0x0(,%rdi, 8), %edi is effectively EDI = 0x0 + RDI * 8 or EDI = RDI * 8. The full calculation is n * 7 - 4;

    The calculation for mystery_util appears to simply be

    n &= (n>>1) & 1;
    

    If I take all these factors together we have a function mystery that passes n * 7 - 4 to a function called mystery_util that returns n &= (n>>1) & 1.

    Since mystery_util returns a single bit value (0 or 1) it is reasonable that bool is the return type.

    I was curious if I could get a particular version of GCC with optimization level 1 (-O1) to reproduce this assembly code. I discovered that GCC 4.9.x will yield this exact assembly code for this given C program:

    #include<stdbool.h>
    
    bool mystery_util(unsigned int n)
    {
        n &= (n>>1) & 1;
        return n;
    }
    
    bool mystery(unsigned int n)
    {
        return mystery_util (7*n+4);
    }
    

    The assembly output is:

    mystery_util:
            movl    %edi, %eax
            shrl    %eax
            andl    $1, %edi
            andl    %edi, %eax
            ret
    mystery:
            movl    %edi, %eax
            leal    0(,%rdi,8), %edi
            subl    %eax, %edi
            addl    $4, %edi
            call    mystery_util
            rep ret
    

    You can play with this code on godbolt.


    Important Update - Version without bool

    I apparently erred in interpreting the question. I assumed the person asking this question determined by themselves that the prototype for mystery was int mystery(int n). I thought I could change that. According to a related question asked on Stackoverflow a day later, it seems int mystery(int n) is given to you as the prototype as part of the assignment. This is important because it means that a modification has to be made.

    The change that needs to be made is related to mystery_util. In the code to be reverse engineered are these lines:

    mov  %edi, %eax
    shr  %eax
    

    EDI is the first parameter. SHR is logical shift right. Compilers would only generate this if EDI was an unsigned int (or equivalent). int is a signed type an would generate SAR (arithmetic shift right). This means that the parameter for mystery_util has to be unsigned int (and it follows that the return value is likely unsigned int. That means the code would look like this:

    unsigned int mystery_util(unsigned int n)
    {
        n &= (n>>1) & 1;
        return n;
    }
    
    int mystery(int n)
    {
        return mystery_util (7*n+4);
    }
    

    mystery now has the prototype given by your professor (bool is removed) and we use unsigned int for the parameter and return type of mystery_util. In order to generate this code with GCC 4.9.x I found you need to use -O1 -fno-inline. This code can be found on godbolt. The assembly output is the same as the version using bool.

    If you use unsigned int mystery_util(int n) you would discover that it doesn't quite output what we want:

    mystery_util:
            movl    %edi, %eax
            sarl    %eax          ; <------- SAR (arithmetic shift right) is not SHR
            andl    $1, %edi
            andl    %edi, %eax
            ret