is there a way to read given amount of instructions from a binary executable file on x86 architecture programmatically?
If I had a binary of a simple C program hello.c
:
#include <stdio.h>
int main(){
printf("Hello world\n");
return 0;
}
Where after compilation using gcc
, the disassembled function main
looks like this:
000000000000063a <main>:
63a: 55 push %rbp
63b: 48 89 e5 mov %rsp,%rbp
63e: 48 8d 3d 9f 00 00 00 lea 0x9f(%rip),%rdi # 6e4 <_IO_stdin_used+0x4>
645: e8 c6 fe ff ff callq 510 <puts@plt>
64a: b8 00 00 00 00 mov $0x0,%eax
64f: 5d pop %rbp
650: c3 retq
651: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
658: 00 00 00
65b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
Is there an easy way in C to read for example first three instructions (meaning the bytes 55, 48, 89, e5, 48, 8d, 3d, 9f, 00, 00, 00
) from main
? It is not guaranteed that the function looks like this - the first instructions may have all different opcodes and sizes.
this prints the 10 first bytes of the main
function by taking the address of the function and converting to a pointer of unsigned char
, print in hex.
This small snippet doesn't count the instructions. For this you would need an instruction size table (not very difficult, just tedious unless you find the table already done, What is the size of each asm instruction?) to be able to predict the size of each instruction given the first byte.
(unless of course, the processor you're targetting has a fixed instruction size, which makes the problem trivial to solve)
Debuggers have to decode operands as well, but in some cases like step or trace, I suspect they have a table handy to compute the next breakpoint address.
#include <stdio.h>
int main(){
printf("Hello world\n");
const unsigned char *start = (const char *)&main;
int i;
for (i=0;i<10;i++)
{
printf("%x\n",start[i]);
}
return 0;
}
output:
Hello world
55
89
e5
83
e4
f0
83
ec
20
e8
seems to match the disassembly :)
00401630 <_main>:
401630: 55 push %ebp
401631: 89 e5 mov %esp,%ebp
401633: 83 e4 f0 and $0xfffffff0,%esp
401636: 83 ec 20 sub $0x20,%esp
401639: e8 a2 01 00 00 call 4017e0 <___main>