Search code examples
c++cieee-754

C/C++: Are IEEE 754 float addition/multiplication/... and int-to-float conversion standardized?


Example:

#include <math.h>
#include <stdio.h>

int main()
{
    float f1 = 1;
    float f2 = 4.f * 3.f;
    float f3 = 1.f / 1024.f;
    float f4 = 3.f - 2.f;
    printf("%a\n",f1);
    printf("%a\n",f2);
    printf("%a\n",f3);
    printf("%a\n",f4);
    return 0;
}

Output on gcc/clang as expected:

0x1p+0
0x1.8p+3
0x1p-10
0x1p+0

As one can see, the results look "reasonable". However, there are probably multiple different ways to display these numbers. Or to display numbers very close.

Is it guaranteed in C and in C++ that IEEE 754 floating arithmetic like addition, multiplication and int-to-float conversion yield the same results, on all machines and with all compilers (i.e. that the resulting floats are all bit-wise equal)?


Solution

  • No, unless the macro __STD_IEC_559__ is defined.

    Basically the standard does not require IEEE 754 compatible floating point, so most compilers will use whatever floating point support the hardware provides. If the hardware provides IEEE compatible floating point, most compilers for that target will use it and predefine the __STD_IEC_559__ macro.

    If the macro is defined, then IEEE 754 guarantees the bit representation (but not the byte order) of float and double as 32-bit and 64-bit IEEE 754. This in turn guarantees bit-exact representation of double arithmetic (but note that the C standard allows float arithmetic to happen at either 32 bit or 64 bit precision).

    The C standard requires that float to int conversion be the same as the trunc function if the result is in range for the resulting type, but unfortunately IEEE doesn't actually define the behavior of functions, just of basic arithmetic. The C spec also allows the compiler reorder operations in violation of IEEE754 (which might affect precision), but most that support IEEE754 will not do that wihout a command line option.

    Anecdotal evidence also suggest that some compilers do not define the macro even though they should while other compilers define it when they should not (do not follow all the requirements of IEEE 754 strictly). These cases should probably be considered compiler bugs.