Search code examples
cfloating-pointclangieee-754floating-point-conversion

Floating-point-to-integer conversion rounding up instead of truncating


I was surprised to find that a floating-point-to-integer conversion rounded up instead of truncating the fractional part. Here is some sample code, compiled using Clang, that reproduces that behavior:

double a = 1.12;  // 1.1200000000000001 * 2^0
double b = 1024LL * 1024 * 1024 * 1024 * 1024;  // 1 * 2^50
double c = a * b;  // 1.1200000000000001 * 2^50
long long d = c;  // 1261007895663739

Using exact math, the floating-point value represents

1.1200000000000001 * 2^50 = 1261007895663738.9925899906842624

I was expecting the resulting integer to be 1261007895663738 due to truncation but it is actually 1261007895663739. Why?


Solution

  • Assuming IEEE 754 double precision, 1.12 is exactly

    1.12000000000000010658141036401502788066864013671875
    

    Written in binary, its significand is exactly:

    1.0001111010111000010100011110101110000101000111101100
    

    Note the last two zeros are intentional, since it's what you get with double precision (1 bit before fraction separator, plus 52 fractional bits).

    So, if you shift by 50 places, you'll get an integer value

    100011110101110000101000111101011100001010001111011.00
    

    or in decimal

    1261007895663739
    

    when converting to long long, no truncation/rounding occurs, the conversion is exact.