Search code examples
cmathgccfloating-pointdouble

What's the first double that deviates from its corresponding long by delta?


I want to know the first double from 0d upwards that deviates by the long of the "same value" by some delta, say 1e-8. I'm failing here though. I'm trying to do this in C although I usually use managed languages, just in case. Please help.


#include <stdio.h>
#include <limits.h>
#define DELTA 1e-8

int main() {
    double d = 0; // checked, the literal is fine
    long i;
    for (i = 0L; i < LONG_MAX; i++) {
         d=i; // gcc does the cast right, i checked
         if (d-i > DELTA || d-i < -DELTA) {
              printf("%f", d);
              break;
         }
    }
}

I'm guessing that the issue is that d-i casts i to double and therefore d==i and then the difference is always 0. How else can I detect this properly -- I'd prefer fun C casting over comparing strings, which would take forever.

ANSWER: is exactly as we expected. 2^53+1 = 9007199254740993 is the first point of difference according to standard C/UNIX/POSIX tools. Thanks much to pax for his program. And I guess mathematics wins again.


Solution

  • Doubles in IEE-754 have a precision of 52 bits which means they can store numbers accurately up to (at least) 251.

    If your longs are 32-bit, they will only have the (positive) range 0 .. 231 - 1 so there is no 32-bit long that cannot be represented exactly as a double. For a 64-bit long, it will be (roughly) 252 so I'd be starting around there, not at zero.

    You can use the following program to detect where the failures start to occur. An earlier version I had relied on the fact that the last digit in a number that continuously doubles follows the sequence {2,4,8,6}. However, I opted eventually to use a known trusted tool (bc) for checking the whole number, not just the last digit.

    Keep in mind that this may be affected by the actions of sprintf() rather than the real accuracy of doubles (I don't think so personally since it had no troubles with certain numbers up to 2143).

    This is the test program I wrote:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main() {
        FILE *fin;
        double d = 3.0; // 2^n + 1 to avoid exact powers of 2.
        int i = 1;
        char ds[1000];
        char tst[1000];
    
        // Loop forever, rely on break to finish.
    
        while (1) {
            // Get C version of the double.
    
            sprintf (ds, "%.0f", d);
    
            // Get bc version of the double.
    
            sprintf (tst, "echo '2^%d - 1' | bc >tmpfile", i);
            system(tst);
            fin = fopen ("tmpfile", "r");
            fgets (tst, sizeof (tst), fin);
            fclose (fin);
            tst[strlen (tst) - 1] = '\0';
    
            // Check them.
    
            if (strcmp (ds, tst) != 0) {
                printf( "2^%d + 1 <-- bc failure\n", i);
                printf( "   got       [%s]\n", ds);
                printf( "   expected  [%s]\n", tst);
                break;
            }
    
            // Output for status then move to next.
    
            printf( "2^%d + 1 = %s\n", i, ds);
            d = (d - 1) * 2 + 1;  // Again, 2^n + 1.
            i++;
        }
    }
    

    This keeps going until:

    2^49 + 1 = 562949953421313
    2^50 + 1 = 1125899906842625
    2^51 + 1 = 2251799813685249
    2^52 + 1 = 4503599627370497
    2^53 + 1 <-- bc failure
       got       [9007199254740992]
       expected  [9007199254740993]
    

    which is roughly about where I'd expect it to fail.

    As an aside, I originally used numbers of the form 2n but that got me all the way up to:

    2^136 = 87112285931760246646623899502532662132736
    2^137 = 174224571863520493293247799005065324265472
    2^138 = 348449143727040986586495598010130648530944
    2^139 = 696898287454081973172991196020261297061888
    2^140 = 1393796574908163946345982392040522594123776
    2^141 = 2787593149816327892691964784081045188247552
    2^142 = 5575186299632655785383929568162090376495104
    2^143 <-- bc failure
       got       [11150372599265311570767859136324180752990210]
       expected  [11150372599265311570767859136324180752990208]
    

    with the size of a double being 8 bytes (checked with sizeof). It turned out these numbers were of the binary form 1000...000, which can be represented for far longer with doubles. That's when I switched to using 2n + 1 to get a better bit pattern (one at the start and one at the end).


    Now, just to be clear, the most reliable way would be to check every number to see which one fails first, but that's going to have quite a long runtime. This approach is the best one given knowledge of the IEEE-754 encodings.