Search code examples
floating-pointx86-64cpu-architecturesimdsse

Can the result of bitwise SIMD logical operations on packed floating points be corrupted by FTZ/DAZ or -ffinite-math-only?


Recall that the exponent mask for a 32 bit float is 0x7F80'0000, and a number is a denormal if and only if that mask consists of all zeroes, and there's at least one non-zero bit in the mantissa; similarly, if the exponent part of a float consists of only ones, the value is either an INF (mantissa of all zeroes), or a NaN (mantissa with some non-zero bits).

FTZ (flush-to-zero) and DAZ (denormals-are-zero) are two register flags that, if set, change the behavior of the CPU (so they're a runtime thing, the same binary code will behave differently depending on these flags). The first one will truncate to zero denormal values resulting from (certain?) operations on floating points, whereas the second ensures that any denormal value encountered by the CPU during (certain?) operations is treated as zero. I couldn't find any documentation on which operations are affected by these flags, so if you know any, I would really appreciate the reference.

On a different part of the spectrum, the flag -ffinite-math-only (one of the flags enabled by -ffast-math on GCC and Clang; similar considerations apply for /fp:fast on MSVC) enables codegen optimizations that are correct only assuming all floating points encountered in code are NOT INFs or NaNs.

Given all that, these are my questions:

  • can a (e.g.) 0x0000'FFFF returned by a call to _mm_cmpeq_ps() be flushed to zero (possibly only on some micro-architectures)? (I'm aware that the actual comparison IS affected by DAZ, as any denormal will compare equal to zero, as explained here: https://stackoverflow.com/a/54047888, so that's not what I'm asking; but this very fact is what worries me when it comes to bitwise logical operations)
  • can a mask like 0x0000'FFFF be treated as zero when used (e.g.) as an argument for _mm_and_ps()?
  • can the compiler do optimizations that will invalidate the correctness of the generated code on masks like 0xFFFF'0000, since strictly speaking they're NaNs?

If these are actual issues, one possible workaround (at least for bitwise logical operations) would be using the integer counterparts of the relevant instructions, but that might incur in delays due to different execution domains, depending on the CPU microarchitecture. Moreover, for comparison instructions, it would just not work as-is, and one would have to write auxiliary methods to basically re-implement IEEE-754's comparisons, which is less than ideal. I'm hoping that both the register and the compiler flags have been implemented with the background assumption that they'll "do the right thing", but I couldn't find any information about this.


Solution

  • can a (e.g.) 0x0000'FFFF returned by a call to _mm_cmpeq_ps() be flushed to zero (possibly only on some micro-architectures)?

    _mm_cmpeq_ps() returns all-ones or all-zeros in each 32-bit element of a vector, so either 0xFFFFFFFF or 0. You may get a 0x0000FFFF from an integer instruction, such as _mm_cmpeq_epi16.

    Whether the value is interpreted as zero depends on the instruction that receives such a value on input. Basically, DAZ and FTZ only apply to instructions that need to interpret the FP values in vectors. For example, if you pass it to an addps (_mm_add_ps) as an input argument then yes, that value may be interpreted as zero, depending on the DAZ flag in MXCSR. If you pass it to a bitwise operation such as andps or store it to memory then no, the value is used unchanged.

    can a mask like 0x0000'FFFF be treated as zero when used (e.g.) as an argument for _mm_and_ps()?

    No, bitwise operations have equivalent effect between integer and FP domains; andps and andpd work exactly like pand. Similarly, various shuffle instructions like unpcklps and shufps don't interpret the FP values and operate on bits "as is".

    can the compiler do optimizations that will invalidate the correctness of the generated code on masks like 0xFFFF'0000, since strictly speaking they're NaNs?

    You will have to ask your compiler vendor or look into its code. Formally, I don't think the compiler is prohibited to perform optimizations on the code assuming that there will be no NaNs or infinities.

    In practice, in vectorized code compilers usually don't interpret vector contents too deeply, especially if they originate from memory that is initialized elsewhere, which is a typical use case for vector code. Bit patterns that correspond to NaNs and infinity may appear in vectors as a result of bitwise operations or loads from memory, and those patterns are not prohibited by -ffast-math. This means that the compiler needs to generate vector code that is more-or-less ambivalent to the input data and fairly corresponds to what the programmer wrote. But the compiler may still perform optimizations on the vectorized code that can affect the result of computations (for example, convert a pair of mulps and addps to an FMA equivalent).

    Flags like -ffast-math mostly apply to scalar operations, where the FP values are more predictable and operations often involve standard library calls like isnan. These library calls, as well as any code that depend on their result, can be outright eliminated as a result of -ffast-math.

    If your code has a vectorized section that passes its results to a scalar section, you should be careful if the vectorized part can produce special FP values. If you enable -ffast-math and the scalar code attempts to handle special values from the vector code, you may find that that handling doesn't work. In that case, either handle the special values in the vector code before exposing those to the scalar code, or disable -ffast-math, at least locally, where that special value handling is done.