Search code examples
cgccassemblydisassemblyinstructions

Locate the machine instructions that access the memory in executable


Edit: I want to test the system by inserting a breakpoint and comparing memory before and after the breakpoint.
I used static analysis to get a list of C source code locations and debugging information (ie, a dwarf) provides a mapping between C source code and machine instructions in executable.
But the problem is that there are many machine instructions that mapped to one line of C source code and I need to test all of them.
The machine instruction to be tested is to modify the memory state. So I want to reduce the number of instruction by eliminating the instruction that doesn't modify the memory.

For example, I have the following source code test.c and I have the line number 5.

2   int var1 = 10;
3   void foo() {
4       int *var2 = (int*)malloc(sizeof(int));
5       for(*var2=var1;;) {
6       /* ... */
7       }
8   }

To be clear, line number 5 accesses the global memory var1 and the heap memory *var2.

I compiled the above program with the command gcc -g test.c and the result is

(a.out)
00000000004004d6 <foo>:
  4004d6:   55                      push   %rbp
  4004d7:   48 89 e5                mov    %rsp,%rbp
  4004da:   48 83 ec 10             sub    $0x10,%rsp
  4004de:   bf 04 00 00 00          mov    $0x4,%edi
  4004e3:   e8 d8 fe ff ff          callq  4003c0 <malloc@plt>
  4004e8:   48 89 45 f8             mov    %rax,-0x8(%rbp)
  4004ec:   8b 15 1e 04 20 00       mov    0x20041e(%rip),%edx        # 600910 <var2>
  4004f2:   48 8b 45 f8             mov    -0x8(%rbp),%rax
  4004f6:   89 10                   mov    %edx,(%rax)
  4004f8:   eb fe                   jmp    4004f8 <foo+0x22>

and dwarfdump -l a.out give me the following result.

0x004004d6  [   3, 0] NS uri: "/home/workspace/test.c"
0x004004de  [   4, 0] NS
0x004004ec  [   5, 0] NS
0x004004f8  [   5, 0] DI=0x1

Now I know that, in the a.out, the location 0x4004ec, 0x4004f2, 0x4004f6 and 0xf004f8 are mapped to the line number 5 in C source code.
But I want to exclude the 0x4004f8 (jmp) which doesn't access the (heap, global or local) memory.

Does anyone know how to get only instructions that access memory?


Solution

  • This is only answering the question about finding asm instructions with explicit memory operands. The part about associating them with C statements is pretty bogus outside of -O0 compiler output (where each statement is compiled to a separate block of instructions to support GDB's jump to another line in the same function, or modifying variables in memory while stopped at breakpoint). See Basile's answer which tries to make some sense of the C statement stuff in the question.


    Intel-syntax disassembly might be handy, because all explicit memory operands will have ptr in them, like mov rax, qword ptr [rbp - 0x8], so you can text search.

    In asm source, the <size> ptr syntax isn't required when a register operand implies the operand size, but disassemblers like objdump -drwC -Mintel always put it in.

    In AT&T syntax, you could also just look for () or a bare symbol name as an operand.

    Don't forget to filter out lea instructions. lea is like the & operator in C. It's a shift-and-add instruction that uses memory-operand syntax and machine encoding.

    Also don't forget to filter out various long-nop instructions that use addressing modes to get the right amount of padding in one instruction. For example:

    66 2e 0f 1f 84 00 00 00 00 00   nop    WORD PTR cs:[rax+rax*1+0x0]
    

    So if the mnemonic is lea or nop, ignore the instruction. (32-bit code sometimes uses other instructions as NOPs, but usually it's actually an lea that sets a register to itself in machine code generated by gas / ld from compiler .p2align directives.)


    objdump disassembles rep stos with explicit operands, like rep stos QWORD PTR es:[rdi],rax. So you will actually get rep movs and rep stos operands. (Note that rep movs and rep cmps have two memory operands, unlike normal instructions. They're implicit in the machine code, but objdump makes them explicit.) This will also miss implicit memory operands like the stack for push / pop and call / ret.