_mm_comieq_ss difference between Clang and GCC

I have some SIMD code that's checking for equality between vars but I'm getting different results between GCC and clang when NaNs are involved:

bool equal(__m128 a, __m128 b){
    return _mm_comieq_ss(a,b) == 1;
}

int main()
{
    __m128 a, b, c;

    a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
    b = _mm_set_ss(1.0f);
    c = _mm_set_ss(1.0f);
    
    std::cout << "comieq(a,b):" << equal(a,b) << std::endl;
    std::cout << "comieq(b,a):" << equal(b,a) << std::endl;
    std::cout << "comieq(b,c):" << equal(b,c) << std::endl;
    std::cout << "comieq(a,a):" << equal(a,a) << std::endl;

    return 0;
}

Clang and GCC return different values:

gcc:
comieq(a,b):1
comieq(b,a):1
comieq(b,c):1
comieq(a,a):1

clang:
comieq(a,b):0
comieq(b,a):0
comieq(b,c):1
comieq(a,a):0

Does anyone have an idea why this is happening? I just want to check if two regs are equal or not; is there an alternative way to do this that's consistent?

godbolt: https://godbolt.org/z/ETKenE45f

Solution

The different handling of return values when comparing NaN values was specifically changed in Clang 3.9.0. Related Link.

Although one would expect intrinsic functions to be just that – intrinsic to the CPU and not compiler-dependent, the comiss instruction produces a result in multiple bits of FLAGS. Different intrinsics check different predicates to define a single boolean return value; in asm it would be up to the programmer to use a combination of instructions like je, jb, and/or jp or setcc / cmovcc to use the compare result.

What is happening here is that GCC is checking only the ZF (zero flag) value, whereas Clang is also (correctly) checking the PF ('parity' flag: set if the comparison is unordered, i.e. one of the inputs is NaN. This matches the way integer FLAGS were set by P6 x87 fcomi, in turn matching older x87 fcom / fstsw ax / sahf).

I shall offer a short quotation from the discussion linked above, which may shed some light on the reasoning behind the decision made by the LLVM (clang) team:

In Clang 3.8.0 and before, comparing two scalars of which at least one is a NaN would return 1. This is also the behavior that GCC, Visual Studio, and our current Emscripten code implements. This behavior is unintuitive in the sense that comparing NaNs in floats have the opposite tradition in IEEE-754, i.e. "nothing is equal to a NaN".

Intel is the original author of these intrinsics, and it must be admitted that these functions have long suffered from poor documentation. Intel doesn't spec in detail how these intrinsics should work with respect to NaNs (https://software.intel.com/en-us/node/514308), but presumably the reference implementation in their own compiler was held as the ground truth. The behavior that GCC, VS and Clang <= 3.8 each follow likely comes from adhering to the original code as implemented in Intel's compilers, where _mm_comieq_ss is implemented to perform the COMISS instruction and return the resulting zero flag (ZF) register state as the output int value of the intrinsic function. The COMISS instruction itself is though well documented since it's part of the ISA, and is shown e.g. at http://x86.renejeschke.de/html/file_module_x86_id_44.html. This shows the origin of the unexpected NaN behavior, since the zero flag is set if the comparison is equal, or if the comparison result is unordered, i.e. at least one of the registers is a NaN.

Following the comment from Peter Cordes, it is now clear that the (modified) clang behaviour is correct and the "poor documentation" from Intel referred to in the above citation has been corrected. The Intel documentation for _mm_comieq_ss now makes it clear that any NaN value present should yield a return value of zero:

Operation

RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0