Search code examples
cwindowsassemblydecompilingobjdump

map exe decompilation back to C language


Im pretty new to assembly, and am trying my best to learn it. Im taking a course to learn it and they mentioned a very remedial Hello World example, that I decomplied.

original c file:

#include <stdio.h>
int main()
{
printf("Hello Students!");
return 0;
}

This was decompiled using the following command:

C:> objdump -d -Mintel HelloStudents.exe > disasm.txt

decompliation (assembly):

push ebp
mov  ebp, esp
and  esp, 0xfffffff0
sub esp, 0x10
call 401e80 <__main>
mov DWORD PTR [esp], 0x404000
call 4025f8 <_puts>
mov eax, 0x0
leave
ret

Im having issues mapping this output from the decompliation, to the original C file can someone help?

Thank you very much!


Solution

  • The technical term for decompiling assembly back into C is "turning hamburger back into cows". The generated assembly will not be a 1-to-1 translation of the source, and depending on the level of optimization may be radically different. You will get something functionally equivalent to the original source, but how closely it resembles that source in structure is heavily variable.

    push ebp
    mov ebp, esp
    and esp, 0xfffffff0
    sub esp, 0x10
    

    This is all preamble, setting up the stack frame for the main function. It aligns the stack pointer (ESP) by 16 bytes then reserves another 16 bytes of space for outgoing function args.

    call 401e80, <___main>
    

    This function call to ___main is how MinGW arranges for libc initialization functions to run at the start of the program, making sure stdio buffers are allocated and stuff like that.


    That's the end of the pre-amble; the part of the function that implements the C statements in your source starts with:

    mov DWORD PTR [esp], 0x404000
    

    This writes the address of the string literal "Hello Students!" onto the stack. Combined with the earliersub esp, 16, this is like apush` instruction. In this 32-bit calling convention, function args are passed on the stack, not registers, so that's where the compiler has to put them before function calls.

    call 4025f8 <_puts>
    

    This calls the puts function. The compiler realized that you weren't doing any format processing in the printf call and replaced it with the simpler puts call.

    mov eax, 0x0
    

    The return value of main is loaded into the eax register

    leave
    ret
    

    Restore the previous EBP value, and tear down the stack frame, then exit the function. ret pops a return address off the stack, which can only work when ESP is pointing at the return address.