as in the title - I want to do as below:
__m128i_u* avxVar = (__m128i_u*)Var; // Var allocated with alloc
*avxVar = _mm_set_epi64(...); // is that ok to assign __m128i to __m128i_u ?
Yes, but note that __m128i_u
is not portable (e.g. to MSVC); it's what GCC/clang use internally to implement unaligned loadu/storeu intrinsics. It's exactly equivalent to do it the normal way:
_mm_storeu_si128((__m128i*)Var, vec);
(where vec
is any __m128i
. e.g. it could be _mm_set_epi64x
or a variable.)
GCC 11's emmintrin.h
implementation of _mm_storeu_si128
is defined like this, taking a __m128i_u*
pointer arg, so the dereference does an unaligned access (if not optimized away).
// GCC internals, quoted for reference only.
// Just use #include <immintrin.h> in your own code
extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_storeu_si128 (__m128i_u *__P, __m128i __B)
{
*__P = __B;
}
So yes, GCC's headers depend on __m128i*
and __m128i_u*
being compatible and implicitly convertible.
As much as _mm_storeu_si128
is an intrinsic for movdqu
, so is a __m128i_u*
dereference. But really these intrinsics just exist to communicate alignment information to the compiler, and it's up to the compiler to decide when to actually load and store, just like with deref of char*
.
(Fun fact: __m128i*
is a may_alias
type, like char*
, so you can point it at anything without violating strict-aliasing. Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?)
Also note that _mm_set_epi64
takes __m64
args: it was for building an SSE2 vector from two MMX vectors, not from scalar int64_t
. You probably want _mm_set_epi64x
void foo(void *Var) {
__m128i_u* avxVar = (__m128i_u*)Var;
*avxVar = _mm_set_epi64x(1, 2);
}
void bar(void *Var) {
_mm_storeu_si128((__m128i*)Var, _mm_set_epi64x(1, 2) );
}
Both functions compile identically (and are semantically equivalent so will always be the same after inlining) across gcc/clang/MSVC.
But only the 2nd one compiles at all with MSVC, as you can see on the Godbolt compiler explorer: https://godbolt.org/z/Y8Wq96Pqs . if you disable the #ifdef __GNUC__
, you get compiler errors on MSVC.
## GCC -O3
foo:
movdqa xmm0, XMMWORD PTR .LC0[rip]
movups XMMWORD PTR [rdi], xmm0
ret
bar:
movdqa xmm0, XMMWORD PTR .LC0[rip]
movups XMMWORD PTR [rdi], xmm0
ret
.LC0:
.quad 2
.quad 1
With more complex surrounding code, _mm_loadu_si128
can fold into a memory source operand for ALU only with AVX (e.g. vpaddb xmm0, xmm1, [rdi]
, but _mm_load_si128
aligned loads can fold into SSE memory sources like paddb xmm0, [rdi]
.