Floating point division in X86 assembly gives weird result

I'm currently studying Simply FPU tutorial. So as exercise for myself, I'd like to learn how to divide floating points in assembly. Let's say I'm going to divide 48.6 by 17.1. Here's the code.

format PE console 4.0
entry main
include 'win32a.inc'

section '.data' data readable writeable
num1 dq 48.6
num2 dq 17.1
result dq ?
fmt db "%g", 10
szBuff db 32 dup (0)

section '.code' code readable executable
main:
fld qword [num1]
fld qword [num2]
fdivp 
fstp qword [result]
invoke printf, fmt, result 
invoke ExitProcess, 0


section '.idata' import data readable
library kernel32,'kernel32.dll', msvcrt,'msvcrt.dll'
import kernel32, ExitProcess,'ExitProcess'
import msvcrt, printf, 'printf'

The output of the code is

7.62883e+265

What goes wrong here?

As suggested by Jester, I examined the code using OllyDbg

ollydbg screenshot

I guess the result was correct, but somehow it was messed up by printf?

Solution

Upvote for using that tutorial, it's very good :)

A couple problems there:

your values will not be st(0) and st(7) they will be st(1) and st(0). The register numbering is fixed, it's always st(0) at the top, but the barrel turns. When you load something it will be st(0). If you load something else afterwards, the barrel rotates, and what you previously had will move to st(1) and the current value will be put in st(0).
make sure your assembler generates the proper-sized instructions, such as for the fld and the fst
make sure your invoke macro knows how to pass floating point arguments to printf
you are not cleaning up the FPU stack (doesn't affect operation here, it's just a general problem)

I recommend you use a debugger to single step the code, so you can see what's happening and you don't even need to mess with trying to use printf.

Update: sample session using gdb on linux with a working code (edited for clarity):

$ cat > div.s
.intel_syntax noprefix
.globl main

.data
num1: .double 48.6
num2: .double 17.1
fmt: .string "%g\n"

.text
main:
    sub esp, 16
    fld qword ptr [num1]    # st(0)=48.6
    fld qword ptr [num2]    # st(0)=17.1, st(1)=48.6
    fdivp                   # st(0)=st(1)/st(0), one item popped
    fstp qword ptr [esp+4]  # store as argument and pop
    mov dword ptr [esp], offset fmt
    call printf
    add esp, 16
    ret
$ gcc -masm=intel -m32 -g div.s -o div
$ ./div
2.84211
$ gdb ./div
GNU gdb (GDB) 7.3.50.20111117-cvs-debian
(gdb) br main
Breakpoint 1 at 0x80483c4: file div.s, line 11.
(gdb) r
Starting program: div 
Breakpoint 1, main () at div.s:11
11          sub esp, 16
(gdb) n
main () at div.s:12
12          fld qword ptr [num1]    # st(0)=48.6
(gdb) 
13          fld qword ptr [num2]    # st(0)=17.1, st(1)=48.6
(gdb) info float
=>R7: Valid   0x4004c266666666666800 +48.60000000000000142      
  R6: Empty   0x00000000000000000000
  R5: Empty   0x00000000000000000000
  R4: Empty   0x00000000000000000000
  R3: Empty   0x00000000000000000000
  R2: Empty   0x00000000000000000000
  R1: Empty   0x00000000000000000000
  R0: Empty   0x00000000000000000000
(gdb) n
14          fdivp                   # st(0)=st(1)/st(0), one item popped
(gdb) info float
  R7: Valid   0x4004c266666666666800 +48.60000000000000142      
=>R6: Valid   0x400388ccccccccccd000 +17.10000000000000142      
  R5: Empty   0x00000000000000000000
  R4: Empty   0x00000000000000000000
  R3: Empty   0x00000000000000000000
  R2: Empty   0x00000000000000000000
  R1: Empty   0x00000000000000000000
  R0: Empty   0x00000000000000000000
(gdb) n
15          fstp qword ptr [esp+4]  # store as argument and pop
(gdb) info float
=>R7: Valid   0x4000b5e50d79435e4e16 +2.842105263157894584      
  R6: Empty   0x400388ccccccccccd000
  R5: Empty   0x00000000000000000000
  R4: Empty   0x00000000000000000000
  R3: Empty   0x00000000000000000000
  R2: Empty   0x00000000000000000000
  R1: Empty   0x00000000000000000000
  R0: Empty   0x00000000000000000000
(gdb) n
16          mov dword ptr [esp], offset fmt
(gdb) info float
  R7: Empty   0x4000b5e50d79435e4e16
  R6: Empty   0x400388ccccccccccd000
  R5: Empty   0x00000000000000000000
  R4: Empty   0x00000000000000000000
  R3: Empty   0x00000000000000000000
  R2: Empty   0x00000000000000000000
  R1: Empty   0x00000000000000000000
=>R0: Empty   0x00000000000000000000

Note that gdb prints the instruction that's gonna be executed next. The FPU stack top is marked by the arrow, that's always st(0) by definition. It is followed by the others in increasing order and wraparound if necessary. The first dump shows 48.6 being loaded into st(0) because that's marked by the arrow, the other locations are empty. Then, 17.1 is being loaded into st(0) again because the arrow has moved (the barrel rotated). The 48.6 is now st(1). FDIVP performs the division and removes one item from the stack, so we end up with result in st(0) and the rest empty. FSTP then stores st(0) for the printf as argument and removes it from the stack, so all registers are now empty.