I was looking for a way to cast a double to a _m128 to take advantage of the intrinsic instructions.
I tried using:
double d = 7654321.1234567;
_m128 ret = *reinterpret_cast<__m128*>(d);
But of course I got the message:
error: invalid cast from type ‘double’ to type ‘__m128* {aka __vector(4) float*}’
Any help would be greatly appreciated, inline-assembly solution is fine~
Assuming you actually wanted a vector of double
(__m128d
), you're looking for _mm_set_sd(d)
to zero-extend a double into __m128d
like _mm_set_pd(0, d)
.
See Intel's intrinsics guide. I found this one by searching on (double
to find intrinsics that take a double
(or double*
) arg.
__m128
is a vector of 4 float
; did you want double -> float conversion into the low element of a vector? Like _mm_set_ps(0.f, 0.f, 0.f, d);
You don't want to point a __m128d*
at a scalar double because the vector is twice as wide as a double
. If anything would have made sense, it would be (__m128d)d
or a static or reinterpret_cast version of that.
But there's unfortunately no way to just cast a double to a __m128d
with an undefined upper element, AFAIK, even though scalar float / double and __m128d
naturally live in XMM registers. See How to merge a scalar into a vector without the compiler wasting an instruction zeroing upper elements? Design limitation in Intel's intrinsics?
Some compilers (well probably still just clang) can optimize away the zero-extension or broadcast into a __m128d
vector if you only use scalar intrinsics and then extract a scalar result. Other compilers actually waste instructions on the upper elements.