Search code examples
cassemblycomplex-numbersavxicc

Does ICC satisfy C99 specs for multiplication of complex numbers?


Consider this simple code:

#include <complex.h>
complex float f(complex float x) {
  return x*x;
}

If you compile it with -O3 -march=core-avx2 -fp-model strict using the Intel Compiler you get:

f:
        vmovsldup xmm1, xmm0                                    #3.12
        vmovshdup xmm2, xmm0                                    #3.12
        vshufps   xmm3, xmm0, xmm0, 177                         #3.12
        vmulps    xmm4, xmm1, xmm0                              #3.12
        vmulps    xmm5, xmm2, xmm3                              #3.12
        vaddsubps xmm0, xmm4, xmm5                              #3.12
        ret 

This is much simpler code than you get from both gcc and clang and also much simpler than the code you will find online for multiplying complex numbers. It doesn't, for example appear explicitly to deal with complex NaN or infinities.

Does this assembly meet the specs for C99 complex multiplication?


Solution

  • The code is non-conforming.

    Annex G, Section 5.1, Paragraph 4 reads

    The * and / operators satisfy the following infinity properties for all real, imaginary, and complex operands:

    — if one operand is an infinity and the other operand is a nonzero finite number or an infinity, then the result of the * operator is an infinity;

    So if z = a * ib is infinite and w = c * id is infinite, the number z * w must be infinite.

    The same annex, Section 3, Paragraph 1 defines what it means for a complex number to be infinite:

    A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN).

    So z is infinite if either a or b are.
    This is indeed a sensible choice as it reflects the mathematical framework1.

    However if we let z = ∞ + i∞ (an infinite value) and w = i∞ (and infinite value) the result for the Intel code is z * w = NaN + iNaN due to the ∞ · 0 intermediates2.

    This suffices to label it as non-conforming.


    We can further confirm this by taking a look at the footnote on the first quote (the footnote was not reported here), it mentions the CX_LIMITED_RANGE pragma directive.

    Section 7.3.4, Paragraph 1 reads

    The usual mathematical formulas for complex multiply, divide, and absolute value are problematic because of their treatment of infinities and because of undue overflow and underflow. The CX_LIMITED_RANGE pragma can be used to inform the implementation that (where the state is ‘‘on’’) the usual mathematical formulas [that produces NaNs] are acceptable.

    Here the standard committee is trying to alleviate the huge mole of work for the complex multiplication (and division).
    In fact GCC has a flag to control this behaviour:

    -fcx-limited-range
    When enabled, this option states that a range reduction step is not needed when performing complex division.

    Also, there is no checking whether the result of a complex multiplication or division is NaN + I*NaN, with an attempt to rescue the situation in that case.

    The default is -fno-cx-limited-range, but is enabled by -ffast-math.
    This option controls the default setting of the ISO C99 CX_LIMITED_RANGE pragma.

    It this option alone that makes GCC generate slow code and additional checks, without it the code it generate has the same flaws of Intel's one (I translated the source to C++)

    f(std::complex<float>):
            movq    QWORD PTR [rsp-8], xmm0
            movss   xmm0, DWORD PTR [rsp-8]
            movss   xmm2, DWORD PTR [rsp-4]
            movaps  xmm1, xmm0
            movaps  xmm3, xmm2
            mulss   xmm1, xmm0
            mulss   xmm3, xmm2
            mulss   xmm0, xmm2
            subss   xmm1, xmm3
            addss   xmm0, xmm0
            movss   DWORD PTR [rsp-16], xmm1
            movss   DWORD PTR [rsp-12], xmm0
            movq    xmm0, QWORD PTR [rsp-16]
            ret
    

    Without it the code is

    f(std::complex<float>):
            sub     rsp, 40
            movq    QWORD PTR [rsp+24], xmm0
            movss   xmm3, DWORD PTR [rsp+28]
            movss   xmm2, DWORD PTR [rsp+24]
            movaps  xmm1, xmm3
            movaps  xmm0, xmm2
            call    __mulsc3
            movq    QWORD PTR [rsp+16], xmm0
            movss   xmm0, DWORD PTR [rsp+16]
            movss   DWORD PTR [rsp+8], xmm0
            movss   xmm0, DWORD PTR [rsp+20]
            movss   DWORD PTR [rsp+12], xmm0
            movq    xmm0, QWORD PTR [rsp+8]
            add     rsp, 40
            ret
    

    and the __mulsc3 function is practically the same the standard C99 recommends for complex multiplication.
    It includes the above mentioned checks.


    1 Where the modulus of a number is extended from the real case |z| to the complex one ‖z‖, keeping the definition of infinite as the result of unbounded limits. Simply put, in the complex plane there is a whole circumference of infinite values and it takes just one "coordinate" to be infinite to get an infinite modulus.

    2 The situation get worst if we remember that z = NaN + i∞ or z = ∞ + iNaN are valid infinite values