What is the Default addition Operator '+' of __m64

I found that the following code(C Files) can be compiled successfully in x86_64, gcc 10.1.0.

#include <immintrin.h>
#include <stdint.h>
#include <stdio.h>

typedef union{
    __m64 x;
#if defined(__arm__) || defined(__aarch64__)
    int32x2_t d[1];
#endif
    uint8_t i8u[8];
}u_m64;

int main()
{
    u_m64 a, b, c;
    c.x = a.x + b.x;
    
    return 0;
}

But there are lots of add function for __m64, like "_mm_add_pi16, _mm_hadd_pi16", "_mm_add_si64" and so on(The same applies to __mm128, __mm256...). So which one is called by the operate '+' ? And how can a 'Operator Overloading' be used in a C Files?

Solution

Yeah, gcc and clang provide basic operators for builtin SIMD types, which is frankly so beyond stupid that it's not even remotely funny :(

Anyhow, this mechanism isn't working in the same way as operator overloading in C++. What it's actually doing, is promoting __m64 to be a true intrinsic type (such as int/float), meaning the operators are at a language level, rather than overload level. (That's why it works in C).

In this case I would assume it is calling add (rather than horizontal add).

However, we now hit the biggest problem! - The contents of __m64 are NOT known at compile time!

Within any given __m64, we could be storing any permutation of:

8 x int8
4 x int16
2 x int32
8 x uint8
4 x uint16
2 x uint32

For addition (ignoring the saturated variants) that means the addition operator could be calling any one these perfectly valid choices:

_mm_add_pi8
_mm_add_pi16
_mm_add_pi32

I don't know which of those instructions gcc/clang ends up calling in this context, however I do know that it's always going to be the wrong instruction 66.66% of the time :(