The intel documentation for _mm256_extractf32x4_ps and _mm256_extractf128_ps read very similar. I could only spot two differences:
_mm256_extractf128_ps
takes a const int
as parameter, _mm256_extractf32x4_ps
takes an int
. This should not make any difference._mm256_extractf128_ps
requires AVX flags, while _mm256_extractf32x4_ps
requires AVX512F + AVX512VL, making the former seemingly more portable across CPUs.What justifies the existence of _mm256_extractf32x4_ps
?
Right, the int
arg has to become an immediate in both cases, so it needs to be a compile-time constant after constant propagation.
And yeah, there's no reason to use the no-masking version of the C intrinsic for the AVX-512VL version in C; it only really makes sense to have _mm256_mask_extractf32x4_ps
and _mm256_maskz_extractf32x4_ps
.
In asm you might want the AVX-512 version because an EVEX encoding is necessary to access ymm16..31
, and only VEXTRACTF32X4
has an EVEX encoding. But this is IMO something your C compiler should be able to take care of for you, whichever intrinsic you write.
If your compiler optimize intrinsics at all, it will know you're compiling with AVX-512 enabled and will use whatever shuffle allows it work with the registers it picked during register allocation. (e.g. clang has a very aggressive shuffle optimizer, often using different instructions or turning shuffles into cheaper blends when possible. Or sometimes defeating efforts to write smarter code than the shuffle optimizer comes up with).
But some compilers (notably MSVC) don't optimize intrinsics, not even doing constant-propagation through them. I think Intel ICC is also like this. (I haven't looked at ICX, their newer clang/LLVM-based compiler.) This model makes it possible to use AVX-512 intrinsics without telling the compiler that it can use AVX-512 instructions on its own. In that case, compiling _mm256_extractf128_ps
to VEXTRACTF32X4
to allow usage of YMM16..31 might be a problem (especially if there weren't other AVX-512VL instructions in the same block, or that will definitely execute if this one did).