I came to those two functions:
_mm512_setzero_epi32()
_mm512_setzero_si512()
Logically, they are the same doing the same thing. Then I checked the generated Assembly and also found the same under different optimization levels.
It is a simple question to ask: why the AVX512 has such a duplicated design to set 0 for int?
_mm512_setzero_epi32()
is 100% redundant, no reason to ever useFor coding-style reasons, I'd recommend against it. It doesn't follow the same pattern of _mm_setzero_si128()
/ _mm256_setzero_si256()
for returning a SIMD-integer vector of all-zeros which _mm512_setzero_si512()
follows.
The situation is very similar to the useless and redundant _mm512_loadu_epi32
(which confusingly loads a whole 64-byte vector, not a 4-byte scalar). Not all compilers even support _mm512_loadu_epi32
or _mm512_loadu_epi64
, which might also be the case for _mm512_setzero_epi32
; another reason to avoid it in favour of more standard and obvious ones.
For redundant intrinsics like _mm512_loadu_epi32
and _mm512_and_epi32
, they're part of a pattern like _mm512_maskz_loadu_epi32
and _mm512_mask_loadu_epi32
; masking requires an element size, and having an unmasked intrinsic as least forms a pattern like for _mm512_add_epi32
where different element-size versions of the same operation have to exist, and there is not _si512
version.
But there are no merge-masking or zero-masking setzero
intrinsics in the current version of the intrinsics guide. So there's no pattern for setzero_epi32
to be part of.
In asm, there is no vpxor zmm
, only vpxord
and vpxorq
, because essentially all AVX-512 instructions support masking, and that means there has to be an element size. (Same for moves like vmovdqa64
/ 32
.)
So does _mm512_setzero_epi32()
imply use of vpxord
? No, Intel's intrinsics guide actually documents it as using vpxorq
, like all other 512-bit zeroing intrinsics (including _mm512_setzero_ps()
- fun fact; EVEX vxorps
requires the AVX512DQ extension, not supported in KNL Xeon Phi, only in mainstream (Skylake-avx512 and later) CPU).
As for what zeroing instruction compilers actually choose to use, could be either, and it makes no difference.