Assembly x86 FPU - Stack Confusion

I am trying to understand FPU, and I am pretty confused. The problem is that as I understand from here, FPU has its own stack. But for example in this code (NASM):

global _main

extern _printf

section .data
    hellomessage db `Hello World!\n`, 10, 0
    numone dd 1.2
    digitsign db '%f', 0xA, 0

section .text
_main:
    ;Greet the user
    push hellomessage
    call _printf
    add esp,4

    sub esp, 8

    fld dword[numone]
    fstp qword[esp]

    push digitsign
    call _printf
    add esp, 12
    ret

I have to have the sub esp, 8 line to "make space" for a double, otherwise the program crashes. But by doing this, I change the pointer of the "regular stack", which does not make sense with my idea of two separate stacks.

I am certain that I do not understand something, but I do not know what this is.

Solution

x87 loads/stores use the same memory addresses that everything else does. The x87 stack is registers st0..st7, not memory at all.

See SIMPLY FPU: Chap. 1 Description of FPU Internals for details on the x87 register stack.

fstp qword[esp] stores 8 bytes to the regular call stack, like mov [esp], eax / mov [esp+4], edx would do. Addressing modes don't change meaning when used with x87 load/store instructions! i.e. your process only has one address space.

So if you remove the sub esp, 8, your fstp would overwrite your return address.

Then at the end of the function, add esp, 12 would leave esp pointing 8 bytes above that, so ret will pop some garbage into EIP and then you segfault when trying to fetch code from that bad address, or the bytes there decode to instructions which segfault.

Above main's return address, you'll find argc and then char **argv. It's a pointer to an array of pointers, so using it as a return address will mean you execute pointer values as code. (If I got this right.)

Use a debugger to see what happens to registers and memory as you single step.

Note that add esp,4 / sub esp, 8 is a bit silly. add esp, +4 - 8 (i.e. add esp, -4) would be a self-documenting way to do that with one instruction.