I'm working with C intrinsics (SSE/SSE2 only) right now, and i have a m128 value with 4 floats in it. Are there any possibility of shifting / shuffling / moving the most upper 32 bits to most lower 32 bits?
Example : I have {1.0f, 2.0f, 3.0f, 4.0f} in m128 and i want to make {4.0f, 2.0f, 3.0f, 1.0f} out of it. (the values in beetween may be erased).
You can do that via shufps xmm, xmm, imm8
instruction, with which you can statically select which input word should be stored for each output word.
#include <stdio.h>
#include <xmmintrin.h>
int main(void) {
float array[4] = {1.0f, 2.0f, 3.0f, 4.0f};
__m128 data;
printf("before : %.1f %.1f %.1f %.1f\n", array[0], array[1], array[2], array[3]);
data = _mm_loadu_ps(array);
data = _mm_shuffle_ps(data, data, 0x27);
_mm_storeu_ps(array, data);
printf("after : %.1f %.1f %.1f %.1f\n", array[0], array[1], array[2], array[3]);
return 0;
}