Search code examples
cassemblybinaryfilesinstructions

How to read binary executable by instructions?


is there a way to read given amount of instructions from a binary executable file on x86 architecture programmatically?

If I had a binary of a simple C program hello.c:

#include <stdio.h>

int main(){
    printf("Hello world\n");
    return 0;
}

Where after compilation using gcc, the disassembled function main looks like this:

000000000000063a <main>:
 63a:   55                      push   %rbp
 63b:   48 89 e5                mov    %rsp,%rbp
 63e:   48 8d 3d 9f 00 00 00    lea    0x9f(%rip),%rdi        # 6e4 <_IO_stdin_used+0x4>
 645:   e8 c6 fe ff ff          callq  510 <puts@plt>
 64a:   b8 00 00 00 00          mov    $0x0,%eax
 64f:   5d                      pop    %rbp
 650:   c3                      retq   
 651:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 658:   00 00 00 
 65b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Is there an easy way in C to read for example first three instructions (meaning the bytes 55, 48, 89, e5, 48, 8d, 3d, 9f, 00, 00, 00) from main? It is not guaranteed that the function looks like this - the first instructions may have all different opcodes and sizes.


Solution

  • this prints the 10 first bytes of the main function by taking the address of the function and converting to a pointer of unsigned char, print in hex.

    This small snippet doesn't count the instructions. For this you would need an instruction size table (not very difficult, just tedious unless you find the table already done, What is the size of each asm instruction?) to be able to predict the size of each instruction given the first byte.

    (unless of course, the processor you're targetting has a fixed instruction size, which makes the problem trivial to solve)

    Debuggers have to decode operands as well, but in some cases like step or trace, I suspect they have a table handy to compute the next breakpoint address.

    #include <stdio.h>
    
    int main(){
        printf("Hello world\n");
        const unsigned char *start = (const char *)&main;
        int i;
        for (i=0;i<10;i++)
        {
           printf("%x\n",start[i]);
        }    
        return 0;
    }
    

    output:

    Hello world
    55
    89
    e5
    83
    e4
    f0
    83
    ec
    20
    e8
    

    seems to match the disassembly :)

    00401630 <_main>:
      401630:   55                      push   %ebp
      401631:   89 e5                   mov    %esp,%ebp
      401633:   83 e4 f0                and    $0xfffffff0,%esp
      401636:   83 ec 20                sub    $0x20,%esp
      401639:   e8 a2 01 00 00          call   4017e0 <___main>