c++assembly sse inline-assembly intrinsics

Cast from double to __m128

I was looking for a way to cast a double to a _m128 to take advantage of the intrinsic instructions.

I tried using:

double d = 7654321.1234567;
_m128 ret =  *reinterpret_cast<__m128*>(d);

But of course I got the message:

error: invalid cast from type ‘double’ to type ‘__m128* {aka __vector(4) float*}’

Any help would be greatly appreciated, inline-assembly solution is fine~

Solution

Assuming you actually wanted a vector of double (__m128d), you're looking for _mm_set_sd(d) to zero-extend a double into __m128d like _mm_set_pd(0, d).

See Intel's intrinsics guide. I found this one by searching on (double to find intrinsics that take a double (or double*) arg.

__m128 is a vector of 4 float; did you want double -> float conversion into the low element of a vector? Like _mm_set_ps(0.f, 0.f, 0.f, d);

You don't want to point a __m128d* at a scalar double because the vector is twice as wide as a double. If anything would have made sense, it would be (__m128d)d or a static or reinterpret_cast version of that.

But there's unfortunately no way to just cast a double to a __m128d with an undefined upper element, AFAIK, even though scalar float / double and __m128d naturally live in XMM registers. See How to merge a scalar into a vector without the compiler wasting an instruction zeroing upper elements? Design limitation in Intel's intrinsics?

Some compilers (well probably still just clang) can optimize away the zero-extension or broadcast into a __m128d vector if you only use scalar intrinsics and then extract a scalar result. Other compilers actually waste instructions on the upper elements.