Search code examples
linuxgccassemblyx86glibc

Understanding assembly language _start label in a C program


I had written a simple c program and was trying to do use GDB to debug the program. I understand the use of following in main function:

On entry

push   %ebp
mov    %esp,%ebp

On exit

leave
ret

Then I tried gdb on _start and I got the following

xor    %ebp,%ebp
pop    %esi
mov    %esp,%ecx
and    $0xfffffff0,%esp
push   %eax
push   %esp
push   %edx
push   $0x80484d0
push   $0x8048470
push   %ecx
push   %esi
push   $0x8048414
call   0x8048328 <__libc_start_main@plt>
hlt
nop
nop
nop
nop

I am unable to understand these lines, and the logic behind this.

Can someone provide any guidance to help explain the code of _start?


Solution

  • Here is the well commented assembly source of the code you posted.

    Summarized, it does the following things:

    1. establish a sentinel stack frame with ebp = 0 so code that walks the stack can find its end easily
    2. Pop the number of command line arguments into esi so we can pass them to __libc_start_main
    3. Align the stack pointer to a multiple of 16 bits in order to comply with the ABI. This is not guaranteed to be the case in some versions of Linux so it has to be done manually just in case.
    4. The addresses of __libc_csu_fini, __libc_csu_init, the argument vector, the number of arguments and the address of main are pushed as arguments to __libc_start_main
    5. __libc_start_main is called. This function (source code here) sets up some glibc-internal variables and eventually calls main. It never returns.
    6. If for any reason __libc_start_main should return, a hlt instruction is placed afterwards. This instruction is not allowed in user code and should cause the program to crash (hopefully).
    7. The final series of nop instructions is padding inserted by the assembler so the next function starts at a multiple of 16 bytes for better performance. It is never reached in normal execution.