Search code examples
c++floating-pointprecisiongotw

A example in Gotw 67


There is a example in http://www.gotw.ca/gotw/067.htm

int main()
{
  double x = 1e8;
  //float x = 1e8;
  while( x > 0 )
  {
    --x;
  }
}

When you change the double to float, it's a infinite loop in VS2008. According to the Gotw explanation:

What if float can't exactly represent all integer values from 0 to 1e8? Then the modified program will start counting down, but will eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)... and then the loop will stay stuck on that value until the machine on which the program is running runs out of power.

From what I understand, the IEEE754 float is a single precision(32 bits) and the range of float should be +/- 3.4e +/- 38 and it should have a 7 digits significant.

But I still don't understand how exactly this happens: "eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)." Can someone try to explan this bit ?

A bit of extra info : When I use double x = 1e8, it finished in about 1 sec, when I change it to float x = 1e8, it runs much longer(still running after 5 min), also if I change it to float x = 1e7;, it finished in about 1 second.

My testing environment is VS2008.

BTW I'm NOT asking the basic IEEE 754 format explanation as I already understand that.

Thanks


Solution

  • Well, for the sake of argument, lets assume we have a processor which represents a floating point number with 7 significant decimal digits, and an mantissa with, say, 2 decimal digits. So now the number 1e8 would be stored as

    1.000 000 e 08
    

    (where the "." and "e" need not be actually stored.)

    So now you want to compute "1e8 - 1". 1 is represented as

    1.000 000 e 00
    

    Now, in order to do the subtraction we first do a subtraction with infinite precision, then normalize so that the first digit before the "." is between 1 and 9, and finally round to the nearest representable value (with break on even, say). The infinite precision result of "1e8 - 1" is

    0.99 999 999 e 08
    

    or normalized

    9.9 999 999 e 07
    

    As can be seen, the infinite precision result needs one more digit in the significand than what our architecture actually provides; hence we need to round (and re-normalize) the infinitely precise result to 7 significant digits, resulting in

    1.000 000 e 08
    

    Hence you end up with "1e8 - 1 == 1e8" and your loop never terminates.

    Now, in reality you're using IEEE 754 binary floats, which are a bit different, but the principle is roughly the same.