Search code examples
c++gccx86floating-pointx87

Why does gcc compare seemingly equal floating point values as different with "-fexcess-precision=standard"?


Look at this snippet:

int main() {
    double v = 1.1;
    return v == 1.1;
}

On 32-bit compilations, this program returns 0, if -fexcess-precision=standard is specified. Without it, the program returns 1.

Why is there a difference? Looking at the assembly code (godbolt), it seems that with -fexcess-precision=standard, gcc uses 1.1 as a long double constant (it loads the constant as TBYTE). Why does it do so?

First I thought it was bug, but I found this gcc bug comment, it seems that this behavior is intentional, or at least it is not unexpected.

Is this a QoI issue? I understand that the comparison is executed using long double precision, but still, my 1.1 is not a long double literal. The weird thing is that if I cast the 1.1 at the comparison to double (which is already a double), the issue goes away.

(Another weird thing is that GCC does the load and compare twice, see the double fucomip instructions. But it does this even in 64-bit mode. I understand that in my godbolt link, optimization is turned off, but still, there is only one comparison in my code, why does GCC compare twice?)

Here's the asm code, without -fexcess-precision=standard:

main:
        push    ebp
        mov     ebp, esp
        and     esp, -8
        sub     esp, 16
        fld     QWORD PTR .LC0
        fstp    QWORD PTR [esp+8]
        fld     QWORD PTR [esp+8]
        fld     QWORD PTR .LC0
        fucomip st, st(1)
        fstp    st(0)
        setnp   al
        mov     edx, 0
        fld     QWORD PTR [esp+8]
        fld     QWORD PTR .LC0
        fucomip st, st(1)
        fstp    st(0)
        cmovne  eax, edx
        movzx   eax, al
        leave
        ret
.LC0:
        .long   -1717986918
        .long   1072798105

And here is with it:

main:
        push    ebp
        mov     ebp, esp
        and     esp, -8
        sub     esp, 16
        fld     QWORD PTR .LC0
        fstp    QWORD PTR [esp+8]
        fld     QWORD PTR [esp+8]
        fld     TBYTE PTR .LC1
        fucomip st, st(1)
        setnp   al
        mov     edx, 0
        fld     TBYTE PTR .LC1
        fucomip st, st(1)
        fstp    st(0)
        cmovne  eax, edx
        movzx   eax, al
        leave
        ret
.LC0:
        .long   -1717986918
        .long   1072798105
.LC1:
        .long   -858993459
        .long   -1932735284
        .long   16383

Solution

  • In C, it is permitted (as indicated via FLT_EVAL_METHOD) that a floating point literal may hold a value with more prevision as permitted by its type and that at the same time floating point operators are evaluated in a higher precision than the operand types permit as well.

    In that case v == 1.1 can be false because the literal 1.1, although of type double, will not be rounded to double precision, but == still compares it in higher precision against the stored value of v which still must be rounded to a value representable by double.

    In C++, although it is still permitted for floating point operations to be evaluated in higher precision, the value of a floating point literal still needs to be rounded to a value representable in its type.

    However, this interacts incorrectly with specification incorporated from C, such as FLT_EVAL_METHOD, and deviates from C for seemingly no reason, so the question of precision of floating point literal values is still an open issue, see https://cplusplus.github.io/CWG/issues/2752.html and https://github.com/cplusplus/papers/issues/1584.

    Without the -fexcess-precision=standard flag GCC doesn't behave standard-conforming at all and may even interpret the value of v as higher precision than its type permits, which is not permitted by either C or C++ standard. (Assignment, casting and initialization should always force a rounding to a value representable value in the actual type.) With that it can happen that v == 1.1 is true again by virtue of both the literal and the value of v as retrieved from the literal never being rounded to a representable double value.

    All of this is typically relevant on e.g. a 32bit x86 compilation, where FLT_EVAL_METHOD will often be defined as 2, meaning that the higher precision mentioned above should always be chosen as if the type was long double. This is to support keeping double as 64bit type while performing floating point operations in 80bit precision on the x87 FPU. Normally this choice for FLT_EVAL_METHOD makes the behavior deterministic in the sense that it is possible to tell exactly where a rounding is applied, but note that GCC's default (-fexcess-precision=fast) will not be consistent in whether and where rounding will be applied at all.

    Given that FLT_EVAL_METHOD is 2 and given the choices for floating point types, following the C rules, v == 1.1 evaluating to false is the only correct standard-conforming behavior. For C++ that is different, but it isn't clear whether that is a defect in the C++ standard. It is therefore somewhat understandable why GCC would follow the behavior required in C.

    The fact that v == 1.1 can evaluate to false is very intentional and programmers need to be aware of the excess precision behavior, unless they make sure that their code only needs to support implementations with FLT_EVAL_METHOD == 0 where no excess precision will be applied.