Search code examples
complex-numbersintrinsicsavxavx2

Unpacking real and imaginary parts of complex numbers into separate ymm registers


I need to read a sequence of complex single precision numbers, stored like [real1, imag1, real2, imag2, ...] into ymm registers and unpack them such that, say, ymm0 contains [real1, real2, real3, ...] and ymm1 contains [imag1, imag2, imag3, ...]. The following code works, but uses four lane-crossing shuffles. Is there a more efficient way to accomplish this than what I'm doing here?

    // the negatives here stand in for imaginary parts
    float _f[] = {1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 7, -7, 8, -8};

    int i[] = {0, 2, 4, 6, 1, 3, 5, 7};

    __m256 a = _mm256_loadu_ps(_f);
    __m256 b = _mm256_loadu_ps(_f+8);

    __m256i x = _mm256_loadu_si256((void*)i);

    __m256 c = _mm256_permutevar8x32_ps(a, x);
    __m256 d = _mm256_permutevar8x32_ps(b, x);

    __m256 e = _mm256_permute2f128_ps(c, d, 0x20);
    __m256 f = _mm256_permute2f128_ps(c, d, 0x31);

At the end of this sequence, e contains the real parts and f contains the imaginary parts. My only concern is that lane-crossing shuffles can be expensive on some machines.


Solution

  • As suggested in the comment by harold, this will do the job of separating the real and imaginary parts into seperate vectors, but the order won't be exactly right. Instead, e will have [real1, real5, real2, real6, ...] and f will have the corresponding imaginary parts. This may be good enough for some applications so I figured it was worth posting in case anybody else finds it useful

        float _f[] = {1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 7, -7, 8, -8};
    
        __m256 a = _mm256_loadu_ps(_f);
        __m256 b = _mm256_loadu_ps(_f+8);
    
        __m256 c = _mm256_permute_ps(a, 0xd8);
        __m256 d = _mm256_permute_ps(b, 0xd8);
    
        __m256 e = _mm256_unpacklo_ps(c,d);
        __m256 f = _mm256_unpackhi_ps(c,d);
    

    EDIT: And, as pointed out by Peter Cordes, the following even shorter solution produces [real1, real2, real5, real6, real3, real4, real7, real8] and the corresponding imaginaries.

        float _f[] = {1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 7, -7, 8, -8};
    
        __m256 a = _mm256_loadu_ps(_f);
        __m256 b = _mm256_loadu_ps(_f+8);
    
        __m256 c = _mm256_shuffle_ps(a, b, 0x88);
        __m256 d = _mm256_shuffle_ps(a, b, 0xdd);