Search code examples
floating-pointprecisiondivisionarbitrary-precision

IEEE 754: can casting before division cause loss of precision?


Do there exist two integers i and j that both fit into an IEEE 754 double (are smaller than DBL_MAX), but such that to_double(i)/to_double(j) is not equal to to_double(i/j), where this i/j is performed with unlimited precision?

(We can assume to_double is round-to-even if that matters).

My question is similar to Invertability of IEEE 754 floating-point division, but I don't think it is equivalent, or at least I don't see how to use it to get a counter-example to my question.


Solution

  • Yes. In a C implementation where double is IEEE-754 basic 64-bit binary floating-point (with 53-bit significands) and long double has 64-bit significands, the output of:

    #include <stdio.h>
    
    int main(void)
    {
        long double x = 0x1p154L - 0x1p101L + 0x1p100L;
        long double y = 0x1p153L + 0x1p101L - 0x1p100L;
        long double z = x / y;
        double X = x;
        double Y = y;
        double Z = X/Y;
        printf("x = %La.\n", x);
        printf("y = %La.\n", y);
        printf("z = %La.\n", z);
        printf("X = %a.\n", X);
        printf("Y = %a.\n", Y);
        printf("Z = %a.\n", Z);
        printf("(double) z = %a.\n", (double) z);
    }
    

    is:

    x = 0xf.ffffffffffffcp+150.
    y = 0x8.0000000000004p+150.
    z = 0xf.ffffffffffff4p-3.
    X = 0x1p+154.
    Y = 0x1p+153.
    Z = 0x1p+1.
    (double) z = 0x1.ffffffffffffep+0.
    

    x / y is performed with long double precision, of course, rather than infinite precision, but it captures sufficient information to show the result with infinite precision would have the same end result—inserting #include <math.h> and z = nexttowardl(z, INFINITY); changes (double) z to be 0x1.fffffffffffffp+0, but this is still not equal to Z.