Edit: I want to test the system by inserting a breakpoint and comparing memory before and after the breakpoint.
I used static analysis to get a list of C source code locations and debugging information (ie, a dwarf) provides a mapping between C source code and machine instructions in executable.
But the problem is that there are many machine instructions that mapped to one line of C source code and I need to test all of them.
The machine instruction to be tested is to modify the memory state.
So I want to reduce the number of instruction by eliminating the instruction that doesn't modify the memory.
For example, I have the following source code test.c
and I have the line number 5
.
2 int var1 = 10;
3 void foo() {
4 int *var2 = (int*)malloc(sizeof(int));
5 for(*var2=var1;;) {
6 /* ... */
7 }
8 }
To be clear, line number 5
accesses the global memory var1
and the heap memory *var2
.
I compiled the above program with the command gcc -g test.c
and the result is
(a.out)
00000000004004d6 <foo>:
4004d6: 55 push %rbp
4004d7: 48 89 e5 mov %rsp,%rbp
4004da: 48 83 ec 10 sub $0x10,%rsp
4004de: bf 04 00 00 00 mov $0x4,%edi
4004e3: e8 d8 fe ff ff callq 4003c0 <malloc@plt>
4004e8: 48 89 45 f8 mov %rax,-0x8(%rbp)
4004ec: 8b 15 1e 04 20 00 mov 0x20041e(%rip),%edx # 600910 <var2>
4004f2: 48 8b 45 f8 mov -0x8(%rbp),%rax
4004f6: 89 10 mov %edx,(%rax)
4004f8: eb fe jmp 4004f8 <foo+0x22>
and dwarfdump -l a.out
give me the following result.
0x004004d6 [ 3, 0] NS uri: "/home/workspace/test.c"
0x004004de [ 4, 0] NS
0x004004ec [ 5, 0] NS
0x004004f8 [ 5, 0] DI=0x1
Now I know that, in the a.out, the location 0x4004ec
, 0x4004f2
, 0x4004f6
and 0xf004f8
are mapped to the line number 5
in C source code.
But I want to exclude the 0x4004f8 (jmp)
which doesn't access the (heap, global or local) memory.
Does anyone know how to get only instructions that access memory?
This is only answering the question about finding asm instructions with explicit memory operands. The part about associating them with C statements is pretty bogus outside of -O0
compiler output (where each statement is compiled to a separate block of instructions to support GDB's jump
to another line in the same function, or modifying variables in memory while stopped at breakpoint). See Basile's answer which tries to make some sense of the C statement stuff in the question.
Intel-syntax disassembly might be handy, because all explicit memory operands will have ptr
in them, like mov rax, qword ptr [rbp - 0x8]
, so you can text search.
In asm source, the <size> ptr
syntax isn't required when a register operand implies the operand size, but disassemblers like objdump -drwC -Mintel
always put it in.
In AT&T syntax, you could also just look for ()
or a bare symbol name as an operand.
Don't forget to filter out lea
instructions. lea
is like the &
operator in C. It's a shift-and-add instruction that uses memory-operand syntax and machine encoding.
Also don't forget to filter out various long-nop
instructions that use addressing modes to get the right amount of padding in one instruction. For example:
66 2e 0f 1f 84 00 00 00 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
So if the mnemonic is lea
or nop
, ignore the instruction. (32-bit code sometimes uses other instructions as NOPs, but usually it's actually an lea
that sets a register to itself in machine code generated by gas
/ ld
from compiler .p2align
directives.)
objdump
disassembles rep stos
with explicit operands, like rep stos QWORD PTR es:[rdi],rax
. So you will actually get rep movs
and rep stos
operands. (Note that rep movs
and rep cmps
have two memory operands, unlike normal instructions. They're implicit in the machine code, but objdump
makes them explicit.) This will also miss implicit memory operands like the stack for push
/ pop
and call
/ ret
.