Search code examples
cx86sseintrinsicsmmx

What is the Default addition Operator '+' of __m64


I found that the following code(C Files) can be compiled successfully in x86_64, gcc 10.1.0.

#include <immintrin.h>
#include <stdint.h>
#include <stdio.h>

typedef union{
    __m64 x;
#if defined(__arm__) || defined(__aarch64__)
    int32x2_t d[1];
#endif
    uint8_t i8u[8];
}u_m64;

int main()
{
    u_m64 a, b, c;
    c.x = a.x + b.x;
    
    return 0;
}

But there are lots of add function for __m64, like "_mm_add_pi16, _mm_hadd_pi16", "_mm_add_si64" and so on(The same applies to __mm128, __mm256...). So which one is called by the operate '+' ? And how can a 'Operator Overloading' be used in a C Files?


Solution

  • Yeah, gcc and clang provide basic operators for builtin SIMD types, which is frankly so beyond stupid that it's not even remotely funny :(

    Anyhow, this mechanism isn't working in the same way as operator overloading in C++. What it's actually doing, is promoting __m64 to be a true intrinsic type (such as int/float), meaning the operators are at a language level, rather than overload level. (That's why it works in C).

    In this case I would assume it is calling add (rather than horizontal add).

    However, we now hit the biggest problem! - The contents of __m64 are NOT known at compile time!

    Within any given __m64, we could be storing any permutation of:

    • 8 x int8
    • 4 x int16
    • 2 x int32
    • 8 x uint8
    • 4 x uint16
    • 2 x uint32

    For addition (ignoring the saturated variants) that means the addition operator could be calling any one these perfectly valid choices:

    • _mm_add_pi8
    • _mm_add_pi16
    • _mm_add_pi32

    I don't know which of those instructions gcc/clang ends up calling in this context, however I do know that it's always going to be the wrong instruction 66.66% of the time :(