Search code examples
stringassemblyx86cmp

IA32 Assembly cmp on strings


I'm having some trouble reverse engineering some IA32 assembly code. Namely, these lines:

   0x08049d6d <+206>:   mov    -0xc(%ebp),%edx
   0x08049d70 <+209>:   mov    -0x14(%ebp),%eax
   0x08049d73 <+212>:   mov    %edx,%ecx
   0x08049d75 <+214>:   sub    %eax,%ecx
   0x08049d77 <+216>:   mov    %ecx,%eax
   0x08049d79 <+218>:   cmp    $0x5,%eax
   0x08049d7c <+221>:   je     0x8049d83 <level_6+228>

Here, the $edx register is holding a string and the $eax register is holding the same string, only with the character at index 0 removed.

The confusion arises from the sub instruction at step 214. It seems, no matter what the two characters are at the start of the strings, it comes out to 1. Is it comparing the lengths of the strings?

Additionally what does calling cmp on two strings compare?

Many thanks!

EDIT:

Earlier, two strings are being cmp'd:

0x08049d68 <+201>: cmp -0xc(%ebp),%eax 
0x08049d6b <+204>: jb 0x8049cfa <level_6+91>

Solution

  • It feels like we've been here before.

    Registers can't "hold strings", unless they're very short ones. What you're got there are addresses of strings, or pointers to (the first characters of) strings. If %edx holds a pointer to the string, and %eax holds "the same string, only with the character at index 0 removed", then %eax almost certainly points to the second character of the string.

    That being the case, if you subtract one from the other, of course you're always going to get 1, because the second character of the string is always one byte further along than the first character. It doesn't matter what the characters are, because you're comparing addresses.

    The cmp instruction is clearly not comparing two strings - it's comparing the literal number 5 with the contents of %eax, which at that point will be your 1, the difference between the two pointers. So the cmp instruction compares 5 with 1, and if they're equal - which they're obviously not, in this case - jmps to 0x8049d83.

    That being said, I suspect you have this back to front. If %edx points to the start of the string, and %eax points to the second character, then the sub instruction should be giving you -1, not 1. %edx and %eax are probably the other way round. This routine seems to be designed to jmp to 0x8049d83 when %edx points to the sixth character of the string that %eax points to.