Search code examples
x86intrinsicsavx512

`vmovdqu8` / 16 / 32 / 64 instructions and `_mm_loadu_epi8` / 16 / 32 / 64 intrinsics purpose


There is movdqu available via _mm_loadu_si128 that requires SSE2.

There is vmovdqu8 (16, 32, 64) available via _mm_loadu_epi8 (16, 32, 64) available via AVX512BW + AVX512VL or AVX512F + AVX512VL.

What is the purpose of the later if they apparently do the same?

If the purpose is the mask, then why are unmasked _mm_loadu_epi8 exposed as intrinsics?


Solution

  • Mostly summarized the answer already given in the comments:

    • The instructions exist only to support masked operations, so the non-masked intrinsics are generally useless. They may have been defined for orthogonality
    • gcc has chosen not to expose at least a part of useless intrinsics; see error: '_mm512_loadu_epi64' was not declared in this scope
    • They could have been potentially useful to mark AVX512 code paths for compilers that allow using any instrinsics without requiring CPU features in compiler options or otherwise, but actually isn't useful this way.
    • So far these unmasked intrinsics presence only creates confusion, don't use them