Search code examples
csimdcomplex-numbersavx2

Multiplication of complex numbers using AVX2+FMA3


I have found some solutions where each AVX2 register holds both, the real and imaginary part of the complex numbers. I am interested in a solution where each AVX2 registers holds either the real or the imaginary part.
Assuming we have 4 AVX2 registers:R1, I1, R2, I2
Registers R1, I1 form 4 complex numbers. Same applies for the remaining two registers. Now I want to multiply the 4 complex numbers of R1, I1 with the 4 complex numbers of R2, I2. What would be the most efficient way to do this? Besides AVX2, FMA3 can be used as well.


Solution

  • You wrote you have AVX2, all Intel and AMD AVX2 processors also support FMA3. For this reason, I would do it like that.

    // 4 FP64 complex numbers stored in 2 AVX vectors,
    // de-interleaved into real and imaginary vectors
    struct Complex4
    {
        __m256d r, i;
    };
    
    // Multiply 4 complex numbers by another 4 numbers
    Complex4 mul4( Complex4 a, Complex4 b )
    {
        Complex4 prod;
        prod.r = _mm256_mul_pd( a.r, b.r );
        prod.i = _mm256_mul_pd( a.r, b.i );
        prod.r = _mm256_fnmadd_pd( a.i, b.i, prod.r );
        prod.i = _mm256_fmadd_pd( a.i, b.r, prod.i );
        return prod;
    }
    

    Or if you targeting that one VIA processor which doesn’t have FMA, replace the FMA intrinsics with the following lines:

    prod.r = _mm256_sub_pd( prod.r, _mm256_mul_pd( a.i, b.i ) );
    prod.i = _mm256_add_pd( prod.i, _mm256_mul_pd( a.i, b.r ) );