Search code examples
assemblyx86gdbreverse-engineeringbinary-bomb

Binary bomb phase 4 assembly


I have a very similar question as "Binary Bomb - Phase 4" but it is still different enough that I'm not entirely sure what to do.

Here is my phase_4 code:

 08048d3e <phase_4>:
 8048d3e:       83 ec 2c                sub    $0x2c,%esp
 8048d41:       8d 44 24 18             lea    0x18(%esp),%eax
 8048d45:       89 44 24 0c             mov    %eax,0xc(%esp)
 8048d49:       8d 44 24 1c             lea    0x1c(%esp),%eax
 8048d4d:       89 44 24 08             mov    %eax,0x8(%esp)
 8048d51:       c7 44 24 04 75 a7 04    movl   $0x804a775,0x4(%esp)
 8048d58:       08
 8048d59:       8b 44 24 30             mov    0x30(%esp),%eax
 8048d5d:       89 04 24                mov    %eax,(%esp)
 8048d60:       e8 6b fb ff ff          call   80488d0 <__isoc99_sscanf@plt>
 8048d65:       83 f8 02                cmp    $0x2,%eax //making sure I have 2 inputs
 8048d68:       75 0e                   jne    8048d78 <phase_4+0x3a> //if not explodes bomb
 8048d6a:       8b 44 24 18             mov    0x18(%esp),%eax
 8048d6e:       83 f8 01                cmp    $0x1,%eax //has to be greater than 1
 8048d71:       7e 05                   jle    8048d78 <phase_4+0x3a> //otherwise jumps to bomb
 8048d73:       83 f8 04                cmp    $0x4,%eax
 8048d76:       7e 05                   jle    8048d7d <phase_4+0x3f> //has to be less than 4 or jumps to bomb
 8048d78:       e8 af 05 00 00          call   804932c <explode_bomb>
 8048d7d:       8b 44 24 18             mov    0x18(%esp),%eax
 8048d81:       89 44 24 04             mov    %eax,0x4(%esp)
 8048d85:       c7 04 24 09 00 00 00    movl   $0x9,(%esp)
 8048d8c:       e8 50 ff ff ff          call   8048ce1 <func4> //calls function 4
 8048d91:       3b 44 24 1c             cmp    0x1c(%esp),%eax
 8048d95:       74 05                   je     8048d9c <phase_4+0x5e> //compares two values and explodes bomb if not equal
 8048d97:       e8 90 05 00 00          call   804932c <explode_bomb>
 8048d9c:       83 c4 2c                add    $0x2c,%esp
 8048d9f:       90                      nop
 8048da0:       c3                      ret

And here is func_4 code:

 08048ce1 <func4>: //not entirely sure what's happening here but it might be a binary search?
 8048ce1:       83 ec 1c                sub    $0x1c,%esp
 8048ce4:       89 5c 24 10             mov    %ebx,0x10(%esp)
 8048ce8:       89 74 24 14             mov    %esi,0x14(%esp)
 8048cec:       89 7c 24 18             mov    %edi,0x18(%esp)
 8048cf0:       8b 74 24 20             mov    0x20(%esp),%esi
 8048cf4:       8b 5c 24 24             mov    0x24(%esp),%ebx
 8048cf8:       85 f6                   test   %esi,%esi
 8048cfa:       7e 2b                   jle    8048d27 <func4+0x46>
 8048cfc:       83 fe 01                cmp    $0x1,%esi
 8048cff:       74 2b                   je     8048d2c <func4+0x4b>
 8048d01:       89 5c 24 04             mov    %ebx,0x4(%esp)
 8048d05:       8d 46 ff                lea    -0x1(%esi),%eax
 8048d08:       89 04 24                mov    %eax,(%esp)
 8048d0b:       e8 d1 ff ff ff          call   8048ce1 <func4>
 8048d10:       8d 3c 18                lea    (%eax,%ebx,1),%edi
 8048d13:       89 5c 24 04             mov    %ebx,0x4(%esp)
 8048d17:       83 ee 02                sub    $0x2,%esi
 8048d1a:       89 34 24                mov    %esi,(%esp)
 8048d1d:       e8 bf ff ff ff          call   8048ce1 <func4>
 8048d22:       8d 1c 07                lea    (%edi,%eax,1),%ebx
 8048d25:       eb 05                   jmp    8048d2c <func4+0x4b>
 8048d27:       bb 00 00 00 00          mov    $0x0,%ebx
 8048d2c:       89 d8                   mov    %ebx,%eax
 8048d2e:       8b 5c 24 10             mov    0x10(%esp),%ebx
 8048d32:       8b 74 24 14             mov    0x14(%esp),%esi
 8048d36:       8b 7c 24 18             mov    0x18(%esp),%edi
 8048d3a:       83 c4 1c                add    $0x1c,%esp
 8048d3d:       c3                      ret

I checked to make sure that the input must be two decimals, and I can also see that at the end two numbers are being compared with one another (line 8048d97, 0x1c(%esp) and %eax). At the beginning of phase_4 I think the code is also indicating that the first number has to be between 1 and 4, and at the end of phase 4, however the number has been modified, it must equal the second number. Please correct me if I'm wrong.

I'm just not sure what the func_4 is doing, and how to determine what the inputs should be. I think it might be the binary search, but not sure how to check how it corresponds with the first input.


Solution

  • what the func_4 is doing, and how to determine what the inputs should be.

     8048d81:       89 44 24 04             mov    %eax,0x4(%esp)
         ; input value loaded to [esp+4] -> [esp+0x24] inside func4
     8048d85:       c7 04 24 09 00 00 00    movl   $0x9,(%esp)
         ; 9 stored to [esp+0] -> [esp+0x20] inside func4
     8048d8c:       e8 50 ff ff ff          call   8048ce1 <func4>
         ; something calculated
         ; then the result is compared with other input value, they should be equal
    

    I'm not sure, which input is which (you know the format string for sscanf and your ABI, so you can tell better, the one stored into [esp+0x18] and tested to be of value 2, 3 or 4 is used in calculation, the one stored into [esp+0x1c] is just compared. I would guess that input for calculation is second (for sscanf at +0x0c, the other one is at +0x08)? So password is "<func4(9,2-4)> <2-4>"?

    As that func4 is clean asm recursive code, you can just run it for all three possible inputs to see what result it produces (in debugger, stepping over instructions, so you manage to catch if some parameter leads to infinite loop or some other nastiness, plus you will get idea how it works).

    Then figure out which input is which.

    not sure what the func_4 is doing

    Hey, that's how assembly works. You rarely take a look on stream of instructions, and recognize the algorithm from them with a quick glance. More often you need to thoroughly emulate CPU state in your head instruction by instruction, paying attention to every detail (like flag change by instruction which looks to be used only for setting up a value, then few instructions later that preserved flag is used in conditional jump), and by keeping some timeline of calculated values it's usually possible to guess the algorithm implemented.

    Or if not algorithm, then at least to know what values are calculated.

    I checked the code one more time, and it looks like variation on Fibonnaci theme, like func4(0,x) = 0, func4(1,x) = x, and func4(y, x) = (just quick guess, too lazy to verify I got it right) x + func4(y-1, x) + func4(y-2, x).

    edit: I'm now sure that formula is not correct, it's missing at least one constant (but maybe it's wrong even more).


    So I wonder, are you too lazy to bother to emulate the CPU thoroughly and get to the correct result, or you have problem with some particular instruction, what it exactly does (ie. you are too lazy to read the instruction reference guide)? So it boils down to "are you lazy or lazy?" Because I'm definitely both, so this is my answer to your question. :)

    In case you don't understand some particular detail about some instruction, just ask.


    edit about comments:

    "//has to be less than 4 or jumps to bomb"

    No, it's jle, that's mnemonics for "jump signed less/equal than", so the correct comment is "has to be less than or equal 4" and that's valid for values [TYPE_MIN, 4].

    For "less than 4" you would need to use jl = "jump signed less".

    For "less than 4" unsigned there's jb = "jump below", that covers values [0, 3]. jbe covers [0, 4] values.

    If you will check Jcc description, you will see those conditional jumps are based only on few flags in the flag register, nothing more (except jcxz/jecxz, which compares cx register and doesn't touch flags at all).

    Also you may notice there are several aliases, so you can write in your code the one which semantically fits your purpose. For example jz and je is the same instruction, first alias stands for "jump when zero (flag)", the other is "jump when equal". So read through it and get used to those abbreviations, it will make the reading of disassembly somewhat easier.


    //calls function 4

    That's useless comment, that's obvious from the instruction itself. Would you add for example arguments (9, input_2), it would be of more benefit to the reader.

    The next "compare" comment is of similar useless nature. What two numbers? If you already did track them down, you should write that result into comment, so something like "compares result of func4 with input_1".


    I will try to show you as example the func4:

     8048ce1 <func4>:
      ; allocates 0x1c bytes on stack for local variables
      ; (so return address is at [esp+0x1c] now, was at [esp])
     8048ce1:       83 ec 1c                sub    $0x1c,%esp
      ; stores current ebx, esi and edi in [esp+0x10 / 0x14 / 0x18]
     8048ce4:       89 5c 24 10             mov    %ebx,0x10(%esp)
     8048ce8:       89 74 24 14             mov    %esi,0x14(%esp)
     8048cec:       89 7c 24 18             mov    %edi,0x18(%esp)
      ; esi = [esp+0x20] = first argument of func4(arg1, arg2)
     8048cf0:       8b 74 24 20             mov    0x20(%esp),%esi
      ; ebx = [esp+0x24] = second argument of func4(arg1, arg2)
     8048cf4:       8b 5c 24 24             mov    0x24(%esp),%ebx
      ; when esi(arg1) <= 0, jump to ExitWith0
     8048cf8:       85 f6                   test   %esi,%esi
     8048cfa:       7e 2b                   jle    8048d27 ExitWith0
      ; when esi(arg1) == 1, jump to ExitWithValueFromEbx
     8048cfc:       83 fe 01                cmp    $0x1,%esi
     8048cff:       74 2b                   je     8048d2c ExitWithValueFromEbx
      ; store arg2 in [esp+4]
     8048d01:       89 5c 24 04             mov    %ebx,0x4(%esp)
      ; store (arg1-1) in [esp] (preparing args for recursive call)
     8048d05:       8d 46 ff                lea    -0x1(%esi),%eax
     8048d08:       89 04 24                mov    %eax,(%esp)
      ; recursion: eax = func4(arg1-1, arg2) ; ebx/esi/edi preserved
     8048d0b:       e8 d1 ff ff ff          call   8048ce1 <func4>
      ; edi = eax (result) + ebx*1 (arg2)
     8048d10:       8d 3c 18                lea    (%eax,%ebx,1),%edi
      ; [esp+4] is set to arg2 again
      ; (it's still there, but this asm looks like debug level of C, not very efficient)
     8048d13:       89 5c 24 04             mov    %ebx,0x4(%esp)
      ; esi -= 2 (no more original arg1 in esi), and set [esp+0]
     8048d17:       83 ee 02                sub    $0x2,%esi
     8048d1a:       89 34 24                mov    %esi,(%esp)
      ; second recursive call: func4(arg1-2, arg2)
     8048d1d:       e8 bf ff ff ff          call   8048ce1 <func4>
      ; ebx = edi + eax*1 ; edi was arg2 + f4(arg1-1, arg2)
     8048d22:       8d 1c 07                lea    (%edi,%eax,1),%ebx
      ; so ebx = arg2 + f4(arg1-1, arg2) + f4(arg1-2, arg2), return that value
     8048d25:       eb 05                   jmp    8048d2c ExitWithValueFromEbx
    ExitWith0:
     8048d27:       bb 00 00 00 00          mov    $0x0,%ebx
    ExitWithValueFromEbx:
     8048d2c:       89 d8                   mov    %ebx,%eax
      ; restore values of ebx/esi/edi (return value is in eax)
     8048d2e:       8b 5c 24 10             mov    0x10(%esp),%ebx
     8048d32:       8b 74 24 14             mov    0x14(%esp),%esi
     8048d36:       8b 7c 24 18             mov    0x18(%esp),%edi
      ; restore esp value, so [esp] is return address, and return
     8048d3a:       83 c4 1c                add    $0x1c,%esp
     8048d3d:       c3                      ret
    

    So I actually guessed in correctly in my answer, those ,1) in lea confused me, I'm not used to AT&T syntax, so at a weaker moment I thought that's +1 to result, but it's *1 to index register (I'm used to Intel syntax, where it would look as lea ebx,[edi+eax]).

    But as you can see, once you start to take down the notes, and focus on single instruction only, I did manage to decipher it correctly this time.

    At the moment it's important for you to make sure you understand every instruction, I mean every detail, like how that lea works and why it doesn't read memory value, even if the argument is (...), and what are all possible addressing modes (how would it look if you would want to do the f(a, b) = 1 + b + f(a-1,b) + f(a-2,b)? Try to find that one, just one lea has to be modified (any of those two)).

    I'm not sure what literature you have for this, as I use Intel syntax everything (I think AT&T syntax is good for C compilers, but not so much for humans ... but overall it's not that bad, mostly annoying if you already know the other one, if you know only AT&T, it may be quite OK).

    In worst case ask, although questions about instructions purpose are "low effort", as all the x86 manuals are freely available, but if you are not sure what some wording really means, and running such instruction few times in debugger doesn't help, you have to ask. Just add what you think and what words are confusing you.