All the four _mm256_broadcastb_epi8, _mm_broadcastw_epi16, _mm256_broadcastd_epi32 and _mm256_broadcastq_epi64
functions are intrinsics for VPBROADCASTB, VPBROADCASTW, VPBROADCASTD and VPBROADCASTQ instructions accordingly.
According Intel's documentation: "Intel® Advanced Vector Extensions Programming Reference",
those instructions may receive a 8-bit, 16-bit 32-bit, 64-bit memory location accordingly.
Page 5-230:
The source operand is 8-bit, 16-bit 32-bit, 64-bit memory location or the low 8-bit, 16-bit 32-bit, 64-bit data in an XMM register
However, the intrinsic API (of Intel, MSVS and gcc) for those instructions receives a __m128i parameter. Now if i have a variable of basic type, supposedly 'short', what is the most efficient and cross-platform way (At least between MSVS and gcc) to pass that variable to the according broadcast intrinsic (_mm_broadcastw_epi16 in case of short)?
For Example:
void func1(uint8_t v) {
__m256i a = _mm256_broadcastb_epi8(<convert_to__m128i>(v));
...
}
void func1(uint16t v) {
__m256i a = _mm256_broadcastw_epi16(<convert_to__m128i>(v));
...
}
void func1(uint32_t v) {
__m256i a = _mm256_broadcastd_epi32(<convert_to__m128i>(v));
...
}
void func1(uint64_t v) {
__m256i a = _mm256_broadcastq_epi64(<convert_to__m128i>(v));
...
}
What should be the <convert_to__m128i> so it is most efficient and cross-platform (if possible)?
For MSVS for example one can do:
void func1(uint16t v) {
__m128i vt;
vt.m128_u16[0] = v;
__m256i a = _mm256_broadcastw_epi16(vt);
...
}
But without optimizations it can first load a xmm register and only then use it in VPBROADCASTW. When with optimizations it may use the memory location of v directly. It is also only valid for MSVS.
There are already sequence/compound intrinsics which do exactly what you want:
_mm256_set1_epi8/16/32/64
From Intels intrinsics guide:
Broadcast 8-bit integer a to all elements of dst. This intrinsic may generate the vpbroadcastb.
Using those you then should be able to trust the compiler to generate the optimal code.
I use the Intel Intrinsics Guide when doing stuff like this which is helpful as you can reverse search from a mnemonic (in this case you knew you eventually wanted vpbroadcastb) and it'll tell you which intrinsics are related to it.