Intel® 64 and IA-32 Architectures Software Developer’s Manual (vol. 2) said, that for instruction FST/FSTP FPU Affected Flags:
Simple test (almost has no value at all) shows me, that C0, C2, C3 can be not affected:
#include <iostream>
#include <bitset>
#include <cstdlib>
#include <cstdint>
int main()
{
double x = -1.0;
std::uint16_t a = 0, b = 0;
asm volatile ("fld %[x] ; ftst ; fnstsw %%ax ; mov %%ax, %[a] ; fstp %%st ; fnstsw %%ax ; mov %%ax, %[b] ;"
: [a]"=m"(a), [b]"=m"(b)
: [x]"t"(x)
: "cc", "memory");
std::cout << std::bitset< 16 >(a) << std::endl;
std::cout << std::bitset< 16 >(b) << std::endl;
std::cout << " ^^^" << std::endl;
}
What does it means "Undefined"? Can FSTP change the values, or just does nothing with values of them?
It would say "unmodified" or "unaffected" if it meant that.
"Undefined" means that the value could be anything, and might differ between CPU microarchitectures. Some CPUs might preserve the old value, some might clear or set the bits, or leak some microarchitectural state into the bits that's potentially different every time you run the instruction. Or they might be set according to the number being NaN or Inf or not.
But Intel is not documenting anything about which of those behaviours will happen. And most importantly, leaving their options open to make it potentially different in future CPUs, so testing what current CPUs do is useless if you want to write safe future-proof code.
(It's likely that Intel will continue to do whatever they currently do, though. But some ground-up redesign might be different.) And of course other vendors might be different. Worth checking AMD's x86 manuals to see if they say what their CPUs do.
(Producing an undefined value is not like C Undefined Behaviour. It doesn't break the rest of your program. C2 will read as 0 or 1, not put it into some weird state where it might change again even without running any instructions that are documented as affecting C2.)
Another usage of "undefined" in asm documentation is for the bsf
and bsr
instructions, when the destination register value is "undefined" for input = 0. (And ZF is set to 1).
In practice Intel hardware leaves the destination unmodified in that case. (So it's a bit like a cmov
where you can put a result for input=0 into the output before running bsf
). AMD actually does document this behaviour in their AMD manuals, and presumably some software that Intel cares about depends on this behaviour. So Intel is highly unlikely to change it, and IDK why they don't just document it so we can take advantage of it. lzcnt
and tzcnt
already exist in BMI1 with well-defined input=0 behaviour.
This dst-unmodified behaviour has a real performance cost: it means the instruction needs an input dependency on what would otherwise be a write-only destination. This can create false dependencies that prevent out-of-order exec. (And even worse, on CPUs before Skylake, lzcnt
and tzcnt
had the same false output dependency. popcnt
still has that for at least a couple uarches after Skylake.)