Search code examples
cgccassembly

Using GCC to produce readable assembly?


I was wondering how to use GCC on my C source file to dump a mnemonic version of the machine code so I could see what my code was being compiled into. You can do this with Java but I haven't been able to find a way with GCC.

I am trying to re-write a C method in assembly and seeing how GCC does it would be a big help.


Solution

  • If you compile with debug symbols (add -g to your GCC command line, even if you're also using -O31), you can use objdump -S to produce a more readable disassembly interleaved with C source.

    >objdump --help
    [...]
    -S, --source             Intermix source code with disassembly
    -l, --line-numbers       Include line numbers and filenames in output
    

    objdump -drwC -Mintel is nice:

    • -r shows symbol names on relocations (so you'd see puts in the call instruction below)
    • -R shows dynamic-linking relocations / symbol names (useful on shared libraries)
    • -C demangles C++ symbol names
    • -w is "wide" mode: it doesn't line-wrap the machine-code bytes
    • -Mintel: use GAS/binutils MASM-like .intel_syntax noprefix syntax instead of AT&T
    • -S: interleave source lines with disassembly.

    You could put something like alias disas="objdump -drwCS -Mintel" in your ~/.bashrc. If not on x86, or if you like AT&T syntax, omit -Mintel.


    Example:

    > gcc -g -c test.c
    > objdump -d -M intel -S test.o
    
    test.o:     file format elf32-i386
    
    
    Disassembly of section .text:
    
    00000000 <main>:
    #include <stdio.h>
    
    int main(void)
    {
       0:   55                      push   ebp
       1:   89 e5                   mov    ebp,esp
       3:   83 e4 f0                and    esp,0xfffffff0
       6:   83 ec 10                sub    esp,0x10
        puts("test");
       9:   c7 04 24 00 00 00 00    mov    DWORD PTR [esp],0x0
      10:   e8 fc ff ff ff          call   11 <main+0x11>
    
        return 0;
      15:   b8 00 00 00 00          mov    eax,0x0
    }
      1a:   c9                      leave  
      1b:   c3                      ret
    

    Note that this isn't using -r so the call rel32=-4 isn't annotated with the puts symbol name. And looks like a broken call that jumps into the middle of the call instruction in main. Remember that the rel32 displacement in the call encoding is just a placeholder until the linker fills in a real offset (to a PLT stub in this case, unless you statically link libc).


    Footnote 1: Interleaving source can be messy and not very helpful in optimized builds; for that, consider https://godbolt.org/ or other ways of visualizing which instructions go with which source lines. In optimized code there's not always a single source line that accounts for an instruction but the debug info will pick one source line for each asm instruction.