Binary Bomb Phase 2 - Decoding assembly code

I've just began learning about assembly code and I came across this binary bomb lab and felt it would be a good exercise.

This is phase 2 of the binary bomb and I need to figure out a 6 number password from this assembly code in order to move to the next phase.

I've been looking at it for a good while now and I can't seem to figure it out.

Any input or explanation on how to potentially solve this would be much appreciated. I would like to get a good understanding of this in order to move forward to the more complicated phases.

    Dump of assembler code for function phase_2:
   0x0000000000400f49 <+0>: push   %rbp
   0x0000000000400f4a <+1>: push   %rbx
   0x0000000000400f4b <+2>: sub    $0x28,%rsp
   0x0000000000400f4f <+6>: mov    %fs:0x28,%rax
   0x0000000000400f58 <+15>:    mov    %rax,0x18(%rsp)
   0x0000000000400f5d <+20>:    xor    %eax,%eax
   0x0000000000400f5f <+22>:    mov    %rsp,%rsi
   0x0000000000400f62 <+25>:    callq  0x401682 <read_six_numbers>
   0x0000000000400f67 <+30>:    cmpl   $0x1,(%rsp) //first password number = 1?
   0x0000000000400f6b <+34>:    je     0x400f72 <phase_2+41>
   0x0000000000400f6d <+36>:    callq  0x40164c <explode_bomb>
   0x0000000000400f72 <+41>:    mov    %rsp,%rbx
   0x0000000000400f75 <+44>:    lea    0x14(%rsp),%rbp
   0x0000000000400f7a <+49>:    mov    (%rbx),%eax
=> 0x0000000000400f7c <+51>:    add    %eax,%eax 
   0x0000000000400f7e <+53>:    cmp    %eax,0x4(%rbx)
   0x0000000000400f81 <+56>:    je     0x400f88 <phase_2+63>
   0x0000000000400f83 <+58>:    callq  0x40164c <explode_bomb>
   0x0000000000400f88 <+63>:    add    $0x4,%rbx
   0x0000000000400f8c <+67>:    cmp    %rbp,%rbx
   0x0000000000400f8f <+70>:    jne    0x400f7a <phase_2+49> // loop? What is it doing?
   0x0000000000400f91 <+72>:    mov    0x18(%rsp),%rax
---Type <return> to continue, or q <return> to quit---return
   0x0000000000400f96 <+77>:    xor    %fs:0x28,%rax
   0x0000000000400f9f <+86>:    je     0x400fa6 <phase_2+93>
   0x0000000000400fa1 <+88>:    callq  0x400b90 <__stack_chk_fail@plt>
   0x0000000000400fa6 <+93>:    add    $0x28,%rsp
   0x0000000000400faa <+97>:    pop    %rbx
   0x0000000000400fab <+98>:    pop    %rbp
   0x0000000000400fac <+99>:    retq   
End of assembler dump.

Solution

You should take on the problem one step at time.

First let's start by removing useless stuff from the dump (extra addresses that only add verbosity); I also like my assembly to be in Intel syntax, the memory accesses and the compares/subtractions read way more nicely.

From a quick glance, we can immediately observe:

 <+0>:    push   rbp        ; save clobbered registers
 <+1>:    push   rbx
 <+2>:    sub    rsp,0x28   ; 40 bytes of locals
 <+6>:    mov    rax,dword ptr fs:0x28  ; stuff referenced from fs are generally
                                        ; thread-local variables of some kind
<+15>:    mov    qword ptr[rsp+0x18],rax; at offset 0x18 there's a long long local
<+20>:    xor    eax,eax
<+22>:    mov    rsi,rsp    ; rsi in System V ABI is the first parameter;
                            ; it's passing straight the start of our locals as a
                            ; pointer
<+25>:    call   <read_six_numbers> ; we can imagine that read_six_numbers takes
                                    ; an array of 6 int values, which are probably
                                    ; the locals between rsp and rsp+0x18 (0x18 =
                                    ; 24 = 6*sizeof(int))
<+30>:    cmp    dword ptr[rsp],0x1 ; this is the first read value
<+34>:    je     <phase_2+41>       ; --\
<+36>:    call   <explode_bomb>     ;   |      first bomb explosion
<+41>:    mov    rbx,rsp            ; <-/   rbx points to the first number
<+44>:    lea    rbp,[rsp+0x14]     ; 0x14 = 20 = 5th element of the numbers array
<+49>:    mov    eax,dword ptr[rbx]
<+51>:    add    eax,eax
<+53>:    cmp    dword ptr[rbx+4],eax   ; rbx points to an integer, so +4 is
                                        ; really the integer that follows it
<+56>:    je     <phase_2+63>
<+58>:    call   <explode_bomb>     ; second bomb explosion
<+63>:    add    rbx,0x4    ; again, this is a +1 in C pointer notation
<+67>:    cmp    rbx,rbp
<+70>:    jne    <phase_2+49>
<+72>:    mov    rax,qword ptr[rsp+0x18]    ; again that long long
<+77>:    xor    rax,qword ptr fs:0x28      ; again that thread-local
<+86>:    je     <phase_2+93>
<+88>:    call   <__stack_chk_fail@plt> ; aaaah they have to do with stack smashing
                                        ; protection; we can ignore them, it's
                                        ; compiler-injected stuff
<+93>:    add    rsp,0x28               ; epilogue
<+97>:    pop    rbx
<+98>:    pop    rbp
<+99>:    ret

Also, we can reasonably assume that the function takes no parameters and returns no value, as it doesn't look at the initial state of rsi or to addresses above (=higher than) the initial value of rsp, and doesn't seem to leave rax in a particularly meaningful state.¹

Now, let's rewrite this in C, leaving the jumps as gotos for now, and leaving register names in lieu of meaningful variables names. We'll ignore completely fs:0x28 and rsp+0x18, as they are just canaries for the gcc-injected stack smashing protection.

void phase_2() {
    int numbers[6];
    read_six_numbers(numbers);
    if(numbers[0] == 1) goto l_41;
    explode_bomb();
l_41:
    int *rbx = &numbers[0];
    int *rbp = &numbers[5];
l_49:
    int eax = *rbx;
    eax += eax;
    if(eax == rbx[1]) goto l_63;
    explode_bomb();
l_63:
    rbx++;
    if(rbx != rbp) goto l_49;
}

The first trivial step is to rewrite the "short jumps" over the bomb explosions as ifs (inverting the condition), and the final goto as a do...while:

void phase_2() {
    int numbers[6];
    read_six_numbers(numbers);
    if(numbers[0] != 1) explode_bomb();
    int *rbx = &numbers[0];
    int *rbp = &numbers[5];
    do {
        int eax = *rbx;
        eax += eax;
        if(eax != rbx[1]) explode_bomb();
        rbx++;
    } while(rbx != rbp);
}

We can then clean up the remainings of the assembly origin by removing the extra variables and giving some more sensible names; also, that do...while can easily be rewritten as a for loop.

void phase_2() {
    int numbers[6];
    read_six_numbers(numbers);
    if(numbers[0] != 1) explode_bomb();
    for(int *ptr = &numbers[0], *end = &numbers[5]; ptr!=end; ++ptr) {
        if(ptr[0]*2 != ptr[1]) explode_bomb();
    }
}

Finally, if that's of any help to understand the algorithm, we can move from pointers to indexes, which is probably closer to how a human would write it:

void phase_2() {
    int numbers[6];
    read_six_numbers(numbers);
    if(numbers[0] != 1) explode_bomb();
    for(int i = 0; i!=5; ++i) {
        if(numbers[i]*2 != numbers[i+1]) explode_bomb();
    }
}

Hence, the secret combination here must be 1, 2, 4, 8, 16, 32, as each number must be twice the previous one, and the first one must be 1.

As a final bonus step, we can funnel the decompiled code back into gcc and discover that the result is almost identical to our disassembly — success!

Notes

Of course, as all assumptions, reasonable as they may be, it can be disproved by further examination, typically when you move up in the next steps and find out that there's stuff you hadn't considered before.

In this case the code is really simple, all my first educated guesses seem to hold well and the reversing process is just a sequence of steps where I'm gradually moving to higher level constructs, but keep in mind that reverse engineering is in general an iterative process, where you may discover later on that some of the working hypotheses you made to get started were actually wrong, and you may have to get back to an earlier stage to adjust them.

Also, making hypotheses such as these is pretty much unavoidable, even if you have the best knowledge of assembly and of what code compilers emit. Compilation is inherently a lossy process, so to reconstruct the high level constructs you always have to apply some heuristics and guesswork.