Search code examples
cssesimdintrinsics

What is the difference between loadu/lddqu and assignment operator?


I am using SIMD vector to do some computations, and curious the difference of them, as followings.

__m128i vector2 = vector1;
__m128i vector2 = _mm_loadu_si128(&vector1);

So, what's the difference of above two statements?


Solution

  • Like Peter Cordes said in his comment, if vector1 really is a __m128i it's just unnecessarily complicated.

    However, that's not the use case for _mm_loadu_si128. While the type of the argument is __m128i const*, that's really more about a lack of good options and poor decisions.

    _mm_loadu_si128 is really meant to load any 16 bytes of data into a vector register. If you want to load data which is already aligned to 16 byte boundaries you should use _mm_load_si128 instead. If your data isn't aligned to 16 byte boundaries it's not really a __m128i, so the type of the parameter is misleading at best.

    The reason Intel (or whoever) chose to use __m128i const* isn't completely clear, but to be honest there aren't a lot of good options. int const* wouldn't really make sense because what we're trying to load isn't always 32-bit signed integers. They could have made an _mm_loadu_epi8, _mm_loadu_epi16, _mm_loadu_epi32, etc., but even that wouldn't be quite right because the data doesn't need to be aligned to _Alignof(int), _Alignof(short), etc., though char actually work well here.

    The right choice would probably be to make the argument void*, but I guess Intel wanted to signal that they really wanted 16 bytes of data. char mem_addr[16] would be okay in C99+, but not in C++, and while SSE2 came out at around the same time as C99 many compilers didn't support C99 (MSVC still doesn't!).

    Basically, for this function, ignore the types of the parameters and read the description.