Search code examples
c++ieee-754

Why is there inaccuracy in the following intermediate type conversion?


The following example is from the page 14 of the book Discovering Modern C++, Peter Gottschling. The author states:

To illustrate this conversion behavior, let us look at the following example:

long l = 1234567890123;
long l2 = l + 1.0f - 1.0; // imprecise
long l3 = l + (1.0f - 1.0); // precise

This leads on the author's platform to:

l2 = 1234567954431;
l3 = 1234567890123;

My question is is that exactly what causes this imprecision? Is it due to left-associativity of addition and subtraction, so that l2is calculated as (l + 1.0f) - 1.0? If so, surely the value range 3.4E +/- 38 (7 digits) of float (see) covers the value 1234567890123, so that to my knowledge narrowing shouldn't be an issue.


Solution

  • A float is typically 32 bits. How do you think it achieves greater range (max value ~3.4e38) compared to the same-sized int, for which the max value is ~2.1e9?

    The only possible answer is that it can't store some of the integers on the way to the max value. And the gaps between representable numbers increase as the absolute value increases.

    Consider this code:

    #include <cmath>
    #include <iostream>
    #include <limits>
    
    void foo(float x, int n)
    {
        while (n-- > 0)
        {
            std::cout << x << "\n "[n > 0];
            x = std::nextafter(x, std::numeric_limits<float>::infinity());
        }
    }
    
    int main()
    {
        std::cout.precision(1000);
    
        foo(0.001, 3);
        foo(1, 3);
        foo(100000000, 3);
    }
    

    It iterates over the float values as slow as possible, i.e. incrementing the value by the smallest possible amount.

    0.001000000047497451305389404296875 0.00100000016391277313232421875 0.001000000280328094959259033203125
    1 1.00000011920928955078125 1.0000002384185791015625
    100000000 100000008 100000016
    

    As you can see, near 100000000 it can only represent every 8th integer.