I've just began learning about assembly code and I came across this binary bomb lab and felt it would be a good exercise.
This is phase 2 of the binary bomb and I need to figure out a 6 number password from this assembly code in order to move to the next phase.
I've been looking at it for a good while now and I can't seem to figure it out.
Any input or explanation on how to potentially solve this would be much appreciated. I would like to get a good understanding of this in order to move forward to the more complicated phases.
Dump of assembler code for function phase_2:
0x0000000000400f49 <+0>: push %rbp
0x0000000000400f4a <+1>: push %rbx
0x0000000000400f4b <+2>: sub $0x28,%rsp
0x0000000000400f4f <+6>: mov %fs:0x28,%rax
0x0000000000400f58 <+15>: mov %rax,0x18(%rsp)
0x0000000000400f5d <+20>: xor %eax,%eax
0x0000000000400f5f <+22>: mov %rsp,%rsi
0x0000000000400f62 <+25>: callq 0x401682 <read_six_numbers>
0x0000000000400f67 <+30>: cmpl $0x1,(%rsp) //first password number = 1?
0x0000000000400f6b <+34>: je 0x400f72 <phase_2+41>
0x0000000000400f6d <+36>: callq 0x40164c <explode_bomb>
0x0000000000400f72 <+41>: mov %rsp,%rbx
0x0000000000400f75 <+44>: lea 0x14(%rsp),%rbp
0x0000000000400f7a <+49>: mov (%rbx),%eax
=> 0x0000000000400f7c <+51>: add %eax,%eax
0x0000000000400f7e <+53>: cmp %eax,0x4(%rbx)
0x0000000000400f81 <+56>: je 0x400f88 <phase_2+63>
0x0000000000400f83 <+58>: callq 0x40164c <explode_bomb>
0x0000000000400f88 <+63>: add $0x4,%rbx
0x0000000000400f8c <+67>: cmp %rbp,%rbx
0x0000000000400f8f <+70>: jne 0x400f7a <phase_2+49> // loop? What is it doing?
0x0000000000400f91 <+72>: mov 0x18(%rsp),%rax
---Type <return> to continue, or q <return> to quit---return
0x0000000000400f96 <+77>: xor %fs:0x28,%rax
0x0000000000400f9f <+86>: je 0x400fa6 <phase_2+93>
0x0000000000400fa1 <+88>: callq 0x400b90 <__stack_chk_fail@plt>
0x0000000000400fa6 <+93>: add $0x28,%rsp
0x0000000000400faa <+97>: pop %rbx
0x0000000000400fab <+98>: pop %rbp
0x0000000000400fac <+99>: retq
End of assembler dump.
You should take on the problem one step at time.
First let's start by removing useless stuff from the dump (extra addresses that only add verbosity); I also like my assembly to be in Intel syntax, the memory accesses and the compares/subtractions read way more nicely.
From a quick glance, we can immediately observe:
<+0>: push rbp ; save clobbered registers
<+1>: push rbx
<+2>: sub rsp,0x28 ; 40 bytes of locals
<+6>: mov rax,dword ptr fs:0x28 ; stuff referenced from fs are generally
; thread-local variables of some kind
<+15>: mov qword ptr[rsp+0x18],rax; at offset 0x18 there's a long long local
<+20>: xor eax,eax
<+22>: mov rsi,rsp ; rsi in System V ABI is the first parameter;
; it's passing straight the start of our locals as a
; pointer
<+25>: call <read_six_numbers> ; we can imagine that read_six_numbers takes
; an array of 6 int values, which are probably
; the locals between rsp and rsp+0x18 (0x18 =
; 24 = 6*sizeof(int))
<+30>: cmp dword ptr[rsp],0x1 ; this is the first read value
<+34>: je <phase_2+41> ; --\
<+36>: call <explode_bomb> ; | first bomb explosion
<+41>: mov rbx,rsp ; <-/ rbx points to the first number
<+44>: lea rbp,[rsp+0x14] ; 0x14 = 20 = 5th element of the numbers array
<+49>: mov eax,dword ptr[rbx]
<+51>: add eax,eax
<+53>: cmp dword ptr[rbx+4],eax ; rbx points to an integer, so +4 is
; really the integer that follows it
<+56>: je <phase_2+63>
<+58>: call <explode_bomb> ; second bomb explosion
<+63>: add rbx,0x4 ; again, this is a +1 in C pointer notation
<+67>: cmp rbx,rbp
<+70>: jne <phase_2+49>
<+72>: mov rax,qword ptr[rsp+0x18] ; again that long long
<+77>: xor rax,qword ptr fs:0x28 ; again that thread-local
<+86>: je <phase_2+93>
<+88>: call <__stack_chk_fail@plt> ; aaaah they have to do with stack smashing
; protection; we can ignore them, it's
; compiler-injected stuff
<+93>: add rsp,0x28 ; epilogue
<+97>: pop rbx
<+98>: pop rbp
<+99>: ret
Also, we can reasonably assume that the function takes no parameters and returns no value, as it doesn't look at the initial state of rsi
or to addresses above (=higher than) the initial value of rsp
, and doesn't seem to leave rax
in a particularly meaningful state.1
Now, let's rewrite this in C, leaving the jumps as gotos for now, and leaving register names in lieu of meaningful variables names. We'll ignore completely fs:0x28
and rsp+0x18
, as they are just canaries for the gcc-injected stack smashing protection.
void phase_2() {
int numbers[6];
read_six_numbers(numbers);
if(numbers[0] == 1) goto l_41;
explode_bomb();
l_41:
int *rbx = &numbers[0];
int *rbp = &numbers[5];
l_49:
int eax = *rbx;
eax += eax;
if(eax == rbx[1]) goto l_63;
explode_bomb();
l_63:
rbx++;
if(rbx != rbp) goto l_49;
}
The first trivial step is to rewrite the "short jumps" over the bomb explosions as if
s (inverting the condition), and the final goto as a do
...while
:
void phase_2() {
int numbers[6];
read_six_numbers(numbers);
if(numbers[0] != 1) explode_bomb();
int *rbx = &numbers[0];
int *rbp = &numbers[5];
do {
int eax = *rbx;
eax += eax;
if(eax != rbx[1]) explode_bomb();
rbx++;
} while(rbx != rbp);
}
We can then clean up the remainings of the assembly origin by removing the extra variables and giving some more sensible names; also, that do
...while
can easily be rewritten as a for
loop.
void phase_2() {
int numbers[6];
read_six_numbers(numbers);
if(numbers[0] != 1) explode_bomb();
for(int *ptr = &numbers[0], *end = &numbers[5]; ptr!=end; ++ptr) {
if(ptr[0]*2 != ptr[1]) explode_bomb();
}
}
Finally, if that's of any help to understand the algorithm, we can move from pointers to indexes, which is probably closer to how a human would write it:
void phase_2() {
int numbers[6];
read_six_numbers(numbers);
if(numbers[0] != 1) explode_bomb();
for(int i = 0; i!=5; ++i) {
if(numbers[i]*2 != numbers[i+1]) explode_bomb();
}
}
Hence, the secret combination here must be 1, 2, 4, 8, 16, 32, as each number must be twice the previous one, and the first one must be 1.
As a final bonus step, we can funnel the decompiled code back into gcc and discover that the result is almost identical to our disassembly — success!
Of course, as all assumptions, reasonable as they may be, it can be disproved by further examination, typically when you move up in the next steps and find out that there's stuff you hadn't considered before.
In this case the code is really simple, all my first educated guesses seem to hold well and the reversing process is just a sequence of steps where I'm gradually moving to higher level constructs, but keep in mind that reverse engineering is in general an iterative process, where you may discover later on that some of the working hypotheses you made to get started were actually wrong, and you may have to get back to an earlier stage to adjust them.
Also, making hypotheses such as these is pretty much unavoidable, even if you have the best knowledge of assembly and of what code compilers emit. Compilation is inherently a lossy process, so to reconstruct the high level constructs you always have to apply some heuristics and guesswork.