Comparing 80 bit floats in FASM with given accuracy

I am writing a program that calculates Pi using the Nilakantha Series in a loop with an accuracy of at least 0.05%. The exit condition for this loop should be when the current calculated value res and the previously calculated value prev fit |res - prev| <= 0.0005. I’ve read up on some floating point comparisons in FASM, but still don’t exactly understand how it works. Currently the program just executes infinitely, never exiting the loop. During debugging I've seen floats often turn into 1.#IND00, which is supposed to be a NaN. How do I write an accurate comparison?

format PE console
entry start

include 'win32a.inc'


section '.code' code readable executable
; 3 + 4/(2*3*4) - 4 / (4*5*6) + 4/(6*7*8) - ...
start:
FINIT
piLoop:

; calculating denominator of fraction that will be added: x1*x2*x3
FLD [denominator]
FMUL [zero]
FADD [x1]
FMUL [x2]
FMUL [x3]
FSTP [denominator]

; changing denominator product values for next loop: x1 +=2, x2 += 2, x3 += 2
FLD [x1]
FADD [stepValue]
FSTP [x1]
FLD [x2]
FADD [stepValue]
FSTP [x2]
FLD [x3]
FADD [stepValue]
FSTP [x3]

;calculating numerator: multiplying numerator by -1
FLD [numerator]
FMUL [sign]
FSTP [numerator]

; calculating fraction: +-4 / (x1 * x2 * x3)
FLD [numerator]
FDIV [denominator]
FSTP [fraction]

; adding calculated fraction to our answer
FLD [res]
FADD [fraction]
FSTP [res]

; the comparison part, incorrect?
FLD [res]
FSUB [prev]
FABS
FCOM [accuracy]
FSTSW AX
SAHF

add [i], 1


; prev = res
FLD [res]
FSTP [prev]
jb endMet
jmp piLoop
endMet:

invoke printf, steps_string, [i]

invoke getch
invoke ExitProcess, 0

section '.data' data readable writable
steps_string db "Calculation completed. The Nilakantha Series took %d steps.",10,0
pi_string db "accurate pi = %lf, calculated pi = %lf", 10, 0


res dq 3.0
x1 dq 2.0
x2 dq 3.0
x3 dq 4.0
stepValue dq 2.0
fraction dq 0.0
numerator dq -4.0
denominator dq 0.0
sign dq -1.0
zero dq 0.0
N dd 20
i dd 0
accuracy dq 0.0005
calc dq ?
prev dq 3.0

section '.idata' import data readable
library kernel, 'kernel32.dll',\
        msvcrt, 'msvcrt.dll',\
        user32,'USER32.DLL'

include 'api\user32.inc'
include 'api\kernel32.inc'
import kernel,\
       ExitProcess, 'ExitProcess',\
       HeapCreate,'HeapCreate',\
       HeapAlloc,'HeapAlloc'
include 'api\kernel32.inc'
import msvcrt,\
       printf, 'printf',\
       sprintf, 'sprintf',\
       scanf, 'scanf',\
       getch, '_getch'

Solution

(Just expanding on my comment, so that this gets an answer.)

For background: the complicated sequence of instructions for floating-point compares comes from the fact that early x86 CPUs didn't have the FPU on-board; it was an optional separate chip, and its ability to interact with the CPU was limited. So the FCOM instruction couldn't set the CPU's FLAGS register directly. Instead, it sets the floating point status word, which was internal to the floating-point coprocessor. The FSTSW instruction could be used to get the status word from the coprocessor and load it into a general-purpose CPU register, and then SAHF would get the appropriate bits of AH and write them to FLAGS.

After all this, you finally get the FLAGS set to indicate the result of the comparison, and the bits of the status word are laid out so as to set FLAGS the same way as for an integer comparison: ZF will be set if the numbers were equal, CF if the difference was strictly negative, and so on. So you can now use conditional jumps like ja, jb, etc, just as you would for unsigned integer comparisons. Note that PF=1 implies the comparison was unordered (at least one operand was NaN), so you need to check that first.

(PPro added FCOMI which sets EFLAGS from the FP compare the same way fcom/fstsw/sahf does, avoiding the extra instructions. See also Why do x86 FP compares set CF like unsigned integers, instead of using signed conditions?)

However, your code has add [i], 1 in between, and like most x86 arithmetic instructions, it sets FLAGS based on the result. So your carefully retrieved FLAGS are overwritten, and the jb a couple lines down is based on the result of the add instead of the FCOM. Thus you need to rearrange those.

For example, do add before SAHF. Or before fcomi.