There is movdqu
available via _mm_loadu_si128
that requires SSE2.
There is vmovdqu8
(16, 32, 64) available via _mm_loadu_epi8
(16, 32, 64) available via AVX512BW + AVX512VL or AVX512F + AVX512VL.
What is the purpose of the later if they apparently do the same?
If the purpose is the mask, then why are unmasked _mm_loadu_epi8
exposed as intrinsics?
Mostly summarized the answer already given in the comments:
_allow_cpu_features
to mark codepaths_mm_loadu_epi8
in question, see https://godbolt.org/z/9aaha1h8r