Why duplicated function in AVX512 to set zero?

I came to those two functions:

_mm512_setzero_epi32()
_mm512_setzero_si512()

Logically, they are the same doing the same thing. Then I checked the generated Assembly and also found the same under different optimization levels.

It is a simple question to ask: why the AVX512 has such a duplicated design to set 0 for int?

Solution

`_mm512_setzero_epi32()` is 100% redundant, no reason to ever use

For coding-style reasons, I'd recommend against it. It doesn't follow the same pattern of _mm_setzero_si128() / _mm256_setzero_si256() for returning a SIMD-integer vector of all-zeros which _mm512_setzero_si512() follows.

The situation is very similar to the useless and redundant _mm512_loadu_epi32 (which confusingly loads a whole 64-byte vector, not a 4-byte scalar). Not all compilers even support _mm512_loadu_epi32 or _mm512_loadu_epi64, which might also be the case for _mm512_setzero_epi32; another reason to avoid it in favour of more standard and obvious ones.

For redundant intrinsics like _mm512_loadu_epi32 and _mm512_and_epi32, they're part of a pattern like _mm512_maskz_loadu_epi32 and _mm512_mask_loadu_epi32; masking requires an element size, and having an unmasked intrinsic as least forms a pattern like for _mm512_add_epi32 where different element-size versions of the same operation have to exist, and there is not _si512 version.

But there are no merge-masking or zero-masking setzero intrinsics in the current version of the intrinsics guide. So there's no pattern for setzero_epi32 to be part of.

In asm, there is no vpxor zmm, only vpxord and vpxorq, because essentially all AVX-512 instructions support masking, and that means there has to be an element size. (Same for moves like vmovdqa64 / 32.)

So does _mm512_setzero_epi32() imply use of vpxord? No, Intel's intrinsics guide actually documents it as using vpxorq, like all other 512-bit zeroing intrinsics (including _mm512_setzero_ps() - fun fact; EVEX vxorps requires the AVX512DQ extension, not supported in KNL Xeon Phi, only in mainstream (Skylake-avx512 and later) CPU).

As for what zeroing instruction compilers actually choose to use, could be either, and it makes no difference.

Why duplicated function in AVX512 to set zero?

_mm512_setzero_epi32() is 100% redundant, no reason to ever use

`_mm512_setzero_epi32()` is 100% redundant, no reason to ever use