Search code examples
cassemblyx86-64object-files

Reverse engineering C object files


I'm practicing reverse engineering C object files. Suppose I have an object file of the C program:

#include <stdio.h>
#include <string.h>

int main (int argc, char ** argv) {
  char * input = argv[1];
  int result = strcmp(input, "text_to_compare");
  
  if (result == 0) {
      printf("%s\n", "text matches");
  }
  else {
      printf("%s\n", "text doeesn't match");
  }
  
  return 0;
}

How would I go about finding "text_to_compare" from the object file given it was compiled with a -g flag and an x86-64 architecture?


Solution

  • Running strings on a binary file will all sequences of four or more printable characters in the file. For a simple file this might be sufficient, but for a larger file you can end up with a lot of false positives. For example, compiling your code with gcc and running strings on the resulting binary will return 295 results.

    We can start by using the objdump command to disassemble the code in your sample file:

    $ objdump --disassemble=main a.out
    
    a.out:     file format elf64-x86-64
    
    
    Disassembly of section .init:
    
    Disassembly of section .plt:
    
    Disassembly of section .text:
    
    0000000000401136 <main>:
      401136:       55                      push   %rbp
      401137:       48 89 e5                mov    %rsp,%rbp
      40113a:       48 83 ec 20             sub    $0x20,%rsp
      40113e:       89 7d ec                mov    %edi,-0x14(%rbp)
      401141:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
      401145:       48 8b 45 e0             mov    -0x20(%rbp),%rax
      401149:       48 8b 40 08             mov    0x8(%rax),%rax
      40114d:       48 89 45 f8             mov    %rax,-0x8(%rbp)
      401151:       48 8b 45 f8             mov    -0x8(%rbp),%rax
      401155:       be 10 20 40 00          mov    $0x402010,%esi
      40115a:       48 89 c7                mov    %rax,%rdi
      40115d:       e8 de fe ff ff          call   401040 <strcmp@plt>
      401162:       89 45 f4                mov    %eax,-0xc(%rbp)
      401165:       83 7d f4 00             cmpl   $0x0,-0xc(%rbp)
      401169:       75 0c                   jne    401177 <main+0x41>
      40116b:       bf 20 20 40 00          mov    $0x402020,%edi
      401170:       e8 bb fe ff ff          call   401030 <puts@plt>
      401175:       eb 0a                   jmp    401181 <main+0x4b>
      401177:       bf 2d 20 40 00          mov    $0x40202d,%edi
      40117c:       e8 af fe ff ff          call   401030 <puts@plt>
      401181:       b8 00 00 00 00          mov    $0x0,%eax
      401186:       c9                      leave
      401187:       c3                      ret
    
    Disassembly of section .fini:
    

    Looking at the disassembly, we can see a call to strcmp at offset 40115d:

    40115d:       e8 de fe ff ff          call   401040 <strcmp@plt>
    

    If we look a couple of lines before that, we can see a instruction that is moving data from an address outside of this section (0x402010):

    401155:       be 10 20 40 00          mov    $0x402010,%esi
    

    If we look at the output of objdump -h a.out, we see that this address falls in the .rodata section (we're looking for sections for which the given address is in the block of memory starting at the address in the VMA column):

    $ objdump -h a.out
    Idx Name          Size      VMA               LMA               File off  Algn
    [...]
     15 .rodata       00000041  0000000000402000  0000000000402000  00002000  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
    [...]
    

    We can extract the data in that section using the objcopy command:

    $ objcopy -j .rodata -O binary a.out >(xxd -o 0x402000)
    00402000: 0100 0200 0000 0000 0000 0000 0000 0000  ................
    00402010: 7465 7874 5f74 6f5f 636f 6d70 6172 6500  text_to_compare.
    00402020: 7465 7874 206d 6174 6368 6573 0074 6578  text matches.tex
    00402030: 7420 646f 6565 736e 2774 206d 6174 6368  t doeesn't match
    00402040: 00                                       .
    

    And we can see that the string at address 0x402010 is text_to_compare.