Could some one please help me with understanding the following code?
fld qword ptr [L1000F168]
fcomp qword ptr [L1000A2F0]
fld qword ptr [L1000F168]
fnstsw ax
test ah,41h
jnz L100012F0
It is output of a compiler converted to assembly from executive code.
What I have caught till now is that
if(value[L1000F168] != value[L1000A2F0])
continue following
else
goto L100012F0
Am I correct?
I do not understand why [L1000F168]
is loaded two times and why ah
is compared with 41h
here? as 41h
means (Invalid operation) or (stack fault)?
See this page for info on the x87 status word.
Note that the 16bit status word is stored to ax
, but the test
instruction is only looking at the upper 8 bits (ah
). So 41h
matches the C0
and C3
bits of the status word. Using an 8-bit test on ah
avoids a slowdown on Intel CPUs from using 16-bit test
with an imm16
. (operand-size prefix changes the length of the rest of the instruction, the so called length-changing prefix decoder stall).
fld / fcomp / fld / fnstsw
: That looks weird to me, too. I wondered if the goal was to have things like the denormal bit in the status word set based on the memory location, but C0
and so on set based on the fcomp
. (This isn't what you get, because fld
leaves the C0-3
undefined or set.)
Intel's insn reference manual says fld
leaves C0
and C3
undefined, so the compiler that generated this code was depending on some specific behaviour. Maybe without an fwait
, the status word wouldn't be updated from the fcomp
yet? I haven't grokked the whole fwait
thing. I haven't found (or looked hard for) an explanation of when you needed fwait
on old CPUs, and when you didn't. As I understand it, you never do on P5 or later.
Anyway, I think what this code does is:
if (! ([L1000F168] > [L1000A2F0]))
goto L100012F0;
Testing not-greater is the same as testing <=
or unordered. This assumes that fld
doesn't actually modify C0
and C3
. Maybe the manual only says it leaves them undefined because it can set them when loading a denormal, or some other condition? Or maybe it doesn't at all in current silicon. Or maybe it's yet another error in the Intel manual, and this is actually known to be fine. Or maybe this code doesn't work anymore! Better test it with a debugger.
I think a more sensible way to get to the result the programmer was probably hoping for would be
fld qword ptr [L1000F168]
fcom qword ptr [L1000A2F0]
fnstsw ax
test ah,41h ; C0|C3: both zero iff st(0) > [L1000A2F0]
jnz L100012F0
Since the OP says the code was compiler output, I guess that means it didn't have good x87 optimizations to avoid pop/reload of the same value.
fld qword ptr [L1000F168]
fcomi qword ptr [L1000A2F0]
jna L100012F0 ; jump unless st(0) > [L1000A2F0]
I possibly have the sense of one of the comparisons backwards, so double check this if something doesn't make sense.
What's this code from? Is it compiler output? Or might it have been hand-written asm?