If I have an AVX register with 4 doubles in them and I want to store the reverse of this in another register, is it possible to do this with a single intrinsic command?
For example: If I had 4 floats in a SSE register, I could use:
_mm_shuffle_ps(A,A,_MM_SHUFFLE(0,1,2,3));
Can I do this using, maybe _mm256_permute2f128_pd()
? I don't think you can address each individual double using the above intrinsic.
You actually need 2 permutes to do this:
_mm256_permute2f128_pd()
only permutes in 128-bit chunks._mm256_permute_pd()
does not permute across 128-bit boundaries.So you need to use both:
inline __m256d reverse(__m256d x){
x = _mm256_permute2f128_pd(x,x,1);
x = _mm256_permute_pd(x,5);
return x;
}
Test:
int main(){
__m256d x = _mm256_set_pd(13,12,11,10);
cout << x.m256d_f64[0] << " " << x.m256d_f64[1] << " " << x.m256d_f64[2] << " " << x.m256d_f64[3] << endl;
x = reverse(x);
cout << x.m256d_f64[0] << " " << x.m256d_f64[1] << " " << x.m256d_f64[2] << " " << x.m256d_f64[3] << endl;
}
Output:
10 11 12 13
13 12 11 10