Search code examples
cgccassemblyx86strcmp

Compiler optimization of strcmp I don't understand, against a constant string


In order to improve my binary exploitation skills, and deepen my understanding in low level environments I tried solving challenges in pwnable.kr, The first challenge- called fd has the following C code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char buf[32];
int main(int argc, char* argv[], char* envp[]){
        if(argc<2){
                printf("pass argv[1] a number\n");
                return 0;
        }
        int fd = atoi( argv[1] ) - 0x1234;
        int len = 0;
        len = read(fd, buf, 32);
        if(!strcmp("LETMEWIN\n", buf)){
                printf("good job :)\n");
                system("/bin/cat flag");
                exit(0);
        }
        printf("learn about Linux file IO\n");
        return 0;

}

I used objdump -S -g ./fd in order to disassemble it, and I got confused, because insteaad of calling a strcmp function. It just compared the strings without calling it. This is the assembly code im talking about:

 80484c6:   e8 05 ff ff ff          call   80483d0 <atoi@plt> 
 80484cb:   2d 34 12 00 00          sub    eax,0x1234 
 ; eax = atoi( argv[1] ) - 0x1234;
 ; initialize fd=eax
 80484d0:   89 44 24 18             mov    DWORD PTR [esp+0x18],eax
 ; initialize len
 80484d4:   c7 44 24 1c 00 00 00    mov    DWORD PTR [esp+0x1c],0x0

 ; Set up read variables
 80484db:   00 
 80484dc:   c7 44 24 08 20 00 00    mov    DWORD PTR [esp+0x8],0x20 ; read 32 bytes
 80484e3:   00 
 80484e4:   c7 44 24 04 60 a0 04    mov    DWORD PTR [esp+0x4],0x804a060 ; buf variable address
 80484eb:   08 
 80484ec:   8b 44 24 18             mov    eax,DWORD PTR [esp+0x18]
 80484f0:   89 04 24                mov    DWORD PTR [esp],eax ; fd variable
 80484f3:   e8 78 fe ff ff          call   8048370 <read@plt> 

 80484f8:   89 44 24 1c             mov    DWORD PTR [esp+0x1c],eax
 80484fc:   ba 46 86 04 08          mov    edx,0x8048646 ; "LETMEWIN\n" address
 8048501:   b8 60 a0 04 08          mov    eax,0x804a060 ; buf address
 8048506:   b9 0a 00 00 00          mov    ecx,0xa ; what is this?
 ; strcmp starts here?
 804850b:   89 d6                   mov    esi,edx
 804850d:   89 c7                   mov    edi,eax
 804850f:   f3 a6                   repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi] ; <------- ?STRCMP?

The things I don't understand are:

  1. Where is the strcmp call? And why is it like that?
  2. What does this 8048506: b9 0a 00 00 00 mov ecx,0xa do?

Solution

  • The compiler inlined strcmp against a known-length string using repe cmpsb which implements memcmp.

    It loads into register esi the address of the constant literal string "LETMEWIN\n". Note that the length of this string is 10 (with the '\0' at the end). Then it loads the address of buf into edi register, then it calls for the x86 instruction:

    repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi]
    

    repz repeats the following instruction as long as zero flag is set and up to the number of times stored in ecx (this explains you the mov ecx,0xa ; what is this?).

    The repeated instruction is cmps which compares strings (byte by byte) and automatically increases the pointers by 1 on each iteration. When the compared bytes are equal, it sets the zero flag.

    So per your questions:

    Where is the strcmp call? And why is it like that?

    No explicit call for strcmp, it is optimized out and replaced with inlined code:

     80484fc:   ba 46 86 04 08          mov    edx,0x8048646 ; "LETMEWIN\n" address
     8048501:   b8 60 a0 04 08          mov    eax,0x804a060 ; buf address
     8048506:   b9 0a 00 00 00          mov    ecx,0xa ; number of bytes to compare
     804850b:   89 d6                   mov    esi,edx
     804850d:   89 c7                   mov    edi,eax
     804850f:   f3 a6                   repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi] ;
    

    Actually it misses the part where it should check if the returned value of strcmp is zero or not. I think you just didn't copy it here. There probably should be something like je ... / jz ... / jne ... / jnz ... right after the repz ... line.

    What does this 8048506: b9 0a 00 00 00 mov ecx,0xa do?

    It sets the maximum number of bytes to compare.