Trying to reverse engineer a function

I'm trying to understand assembly in x86 more. I have a mystery function here that I know returns an int and takes an int argument. So it looks like int mystery(int n){}. I can't figure out the function in C however. The assembly is:

mov  %edi, %eax
lea  0x0(,%rdi, 8), %edi
sub  %eax, %edi
add  $0x4, %edi
callq < mystery _util >
repz retq

< mystery _util >
mov  %edi, %eax
shr  %eax
and  $0x1, %edi
and  %edi, %eax
retq

I don't understand what the lea does here and what kind of function it could be.

Solution

The assembly code appeared to be computer generated, and something that was probably compiled by GCC since there is a repz retq after an unconditional branch (call). There is also an indication that because there isn't a tail call (jmp) instead of a call when going to mystery_util that the code was compiled with -O1 (higher optimization levels would likely inline the function which didn't happen here). The lack of frame pointers and extra load/stores indicated that it isn't compiled with -O0

Multiplying x by 7 is the same as multiplying x by 8 and subtracting x. That is what the following code is doing:

lea  0x0(,%rdi, 8), %edi
sub  %eax, %edi

LEA can compute addresses but it can be used for simple arithmetic as well. The syntax for a memory operand is displacement(base, index, scale). Scale can be 1, 2, 4, 8. The computation is displacement + base + index * scale. In your case lea 0x0(,%rdi, 8), %edi is effectively EDI = 0x0 + RDI * 8 or EDI = RDI * 8. The full calculation is n * 7 - 4;

The calculation for mystery_util appears to simply be

n &= (n>>1) & 1;

If I take all these factors together we have a function mystery that passes n * 7 - 4 to a function called mystery_util that returns n &= (n>>1) & 1.

Since mystery_util returns a single bit value (0 or 1) it is reasonable that bool is the return type.

I was curious if I could get a particular version of GCC with optimization level 1 (-O1) to reproduce this assembly code. I discovered that GCC 4.9.x will yield this exact assembly code for this given C program:

#include<stdbool.h>

bool mystery_util(unsigned int n)
{
    n &= (n>>1) & 1;
    return n;
}

bool mystery(unsigned int n)
{
    return mystery_util (7*n+4);
}

The assembly output is:

mystery_util:
        movl    %edi, %eax
        shrl    %eax
        andl    $1, %edi
        andl    %edi, %eax
        ret
mystery:
        movl    %edi, %eax
        leal    0(,%rdi,8), %edi
        subl    %eax, %edi
        addl    $4, %edi
        call    mystery_util
        rep ret

You can play with this code on godbolt.

Important Update - Version without bool

I apparently erred in interpreting the question. I assumed the person asking this question determined by themselves that the prototype for mystery was int mystery(int n). I thought I could change that. According to a related question asked on Stackoverflow a day later, it seems int mystery(int n) is given to you as the prototype as part of the assignment. This is important because it means that a modification has to be made.

The change that needs to be made is related to mystery_util. In the code to be reverse engineered are these lines:

mov  %edi, %eax
shr  %eax

EDI is the first parameter. SHR is logical shift right. Compilers would only generate this if EDI was an unsigned int (or equivalent). int is a signed type an would generate SAR (arithmetic shift right). This means that the parameter for mystery_util has to be unsigned int (and it follows that the return value is likely unsigned int. That means the code would look like this:

unsigned int mystery_util(unsigned int n)
{
    n &= (n>>1) & 1;
    return n;
}

int mystery(int n)
{
    return mystery_util (7*n+4);
}

mystery now has the prototype given by your professor (bool is removed) and we use unsigned int for the parameter and return type of mystery_util. In order to generate this code with GCC 4.9.x I found you need to use -O1 -fno-inline. This code can be found on godbolt. The assembly output is the same as the version using bool.

If you use unsigned int mystery_util(int n) you would discover that it doesn't quite output what we want:

mystery_util:
        movl    %edi, %eax
        sarl    %eax          ; <------- SAR (arithmetic shift right) is not SHR
        andl    $1, %edi
        andl    %edi, %eax
        ret