Recall that the exponent mask for a 32 bit float is 0x7F80'0000
, and a number is a denormal if and only if that mask consists of all zeroes, and there's at least one non-zero bit in the mantissa; similarly, if the exponent part of a float consists of only ones, the value is either an INF (mantissa of all zeroes), or a NaN (mantissa with some non-zero bits).
FTZ (flush-to-zero) and DAZ (denormals-are-zero) are two register flags that, if set, change the behavior of the CPU (so they're a runtime thing, the same binary code will behave differently depending on these flags). The first one will truncate to zero denormal values resulting from (certain?) operations on floating points, whereas the second ensures that any denormal value encountered by the CPU during (certain?) operations is treated as zero. I couldn't find any documentation on which operations are affected by these flags, so if you know any, I would really appreciate the reference.
On a different part of the spectrum, the flag -ffinite-math-only
(one of the flags enabled by -ffast-math
on GCC and Clang; similar considerations apply for /fp:fast
on MSVC) enables codegen optimizations that are correct only assuming all floating points encountered in code are NOT INFs or NaNs.
Given all that, these are my questions:
0x0000'FFFF
returned by a call to _mm_cmpeq_ps()
be flushed to zero (possibly only on some micro-architectures)? (I'm aware that the actual comparison IS affected by DAZ, as any denormal will compare equal to zero, as explained here: https://stackoverflow.com/a/54047888, so that's not what I'm asking; but this very fact is what worries me when it comes to bitwise logical operations)0x0000'FFFF
be treated as zero when used (e.g.) as an argument for _mm_and_ps()
?0xFFFF'0000
, since strictly speaking they're NaNs?If these are actual issues, one possible workaround (at least for bitwise logical operations) would be using the integer counterparts of the relevant instructions, but that might incur in delays due to different execution domains, depending on the CPU microarchitecture. Moreover, for comparison instructions, it would just not work as-is, and one would have to write auxiliary methods to basically re-implement IEEE-754's comparisons, which is less than ideal. I'm hoping that both the register and the compiler flags have been implemented with the background assumption that they'll "do the right thing", but I couldn't find any information about this.
can a (e.g.)
0x0000'FFFF
returned by a call to_mm_cmpeq_ps()
be flushed to zero (possibly only on some micro-architectures)?
_mm_cmpeq_ps()
returns all-ones or all-zeros in each 32-bit element of a vector, so either 0xFFFFFFFF or 0. You may get a 0x0000FFFF from an integer instruction, such as _mm_cmpeq_epi16
.
Whether the value is interpreted as zero depends on the instruction that receives such a value on input. Basically, DAZ and FTZ only apply to instructions that need to interpret the FP values in vectors. For example, if you pass it to an addps
(_mm_add_ps
) as an input argument then yes, that value may be interpreted as zero, depending on the DAZ flag in MXCSR. If you pass it to a bitwise operation such as andps
or store it to memory then no, the value is used unchanged.
can a mask like
0x0000'FFFF
be treated as zero when used (e.g.) as an argument for_mm_and_ps()
?
No, bitwise operations have equivalent effect between integer and FP domains; andps
and andpd
work exactly like pand
. Similarly, various shuffle instructions like unpcklps
and shufps
don't interpret the FP values and operate on bits "as is".
can the compiler do optimizations that will invalidate the correctness of the generated code on masks like
0xFFFF'0000
, since strictly speaking they're NaNs?
You will have to ask your compiler vendor or look into its code. Formally, I don't think the compiler is prohibited to perform optimizations on the code assuming that there will be no NaNs or infinities.
In practice, in vectorized code compilers usually don't interpret vector contents too deeply, especially if they originate from memory that is initialized elsewhere, which is a typical use case for vector code. Bit patterns that correspond to NaNs and infinity may appear in vectors as a result of bitwise operations or loads from memory, and those patterns are not prohibited by -ffast-math
. This means that the compiler needs to generate vector code that is more-or-less ambivalent to the input data and fairly corresponds to what the programmer wrote. But the compiler may still perform optimizations on the vectorized code that can affect the result of computations (for example, convert a pair of mulps
and addps
to an FMA equivalent).
Flags like -ffast-math
mostly apply to scalar operations, where the FP values are more predictable and operations often involve standard library calls like isnan
. These library calls, as well as any code that depend on their result, can be outright eliminated as a result of -ffast-math
.
If your code has a vectorized section that passes its results to a scalar section, you should be careful if the vectorized part can produce special FP values. If you enable -ffast-math
and the scalar code attempts to handle special values from the vector code, you may find that that handling doesn't work. In that case, either handle the special values in the vector code before exposing those to the scalar code, or disable -ffast-math
, at least locally, where that special value handling is done.