Search code examples
cssevectorizationsimdavx

Reverse a AVX register containing doubles using a single AVX intrinsic


If I have an AVX register with 4 doubles in them and I want to store the reverse of this in another register, is it possible to do this with a single intrinsic command?

For example: If I had 4 floats in a SSE register, I could use:

_mm_shuffle_ps(A,A,_MM_SHUFFLE(0,1,2,3));

Can I do this using, maybe _mm256_permute2f128_pd()? I don't think you can address each individual double using the above intrinsic.


Solution

  • You actually need 2 permutes to do this:

    • _mm256_permute2f128_pd() only permutes in 128-bit chunks.
    • _mm256_permute_pd() does not permute across 128-bit boundaries.

    So you need to use both:

    inline __m256d reverse(__m256d x){
        x = _mm256_permute2f128_pd(x,x,1);
        x = _mm256_permute_pd(x,5);
        return x;
    }
    

    Test:

    int main(){
        __m256d x = _mm256_set_pd(13,12,11,10);
    
        cout << x.m256d_f64[0] << "  " << x.m256d_f64[1] << "  " << x.m256d_f64[2] << "  " << x.m256d_f64[3] << endl;
        x = reverse(x);
        cout << x.m256d_f64[0] << "  " << x.m256d_f64[1] << "  " << x.m256d_f64[2] << "  " << x.m256d_f64[3] << endl;
    }
    

    Output:

    10  11  12  13
    13  12  11  10