I realized when I am looking at some files through GDB, very frequently, there are these three lines of codes at the starting of the function
0x08048548 <+0>: lea ecx,[esp+0x4]
0x0804854c <+4>: and esp,0xfffffff0
0x0804854f <+7>: push DWORD PTR [ecx-0x4]
I usually ignored them because right after those three lines stack frame gets created which is how functions usually start.
Thank you.
This is aligning the stack pointer to a 16-byte boundary, because sometimes (for SSE) the CPU needs 16 byte alignment of data.
A good compiler will examine the call graph (figure out what calls what), and will decide that:
the function doesn't need stack alignment itself and doesn't call other functions that need stack alignment; and therefore no stack alignment is needed
all of the function's callers used an aligned stack, and therefore either:
sub esp, 8
(which could be merged with any code that reserves stack space for local variables)none of the above can be proven to be true, so the function has to assume "worst case" and enforce alignment itself (e.g. the instructions you've seen at the start of the function)
Of course for a good compiler, the last case (where the code you've shown is needed) is extremely rare.
However; most compilers can't be good because they're not able to see the whole program (if the program is split into multiple object files that are compiled separately, then the compiler can only see a fraction of the program at a time). They can't figure out much/any of the call graph, so the last case (where the code you've shown is needed) becomes very common. To solve this you need "link time code generation", but often people don't bother.
Note: For AVX2 you want 32 byte alignment, for AVX512 you want 64 byte alignment, and for some things (to avoid false sharing in heavily threaded code) you might want "cache line size alignment" (typically also 64 byte alignment). This makes the "examine call graph to determine what alignment is actually needed" algorithm a little more complicated than what I described.