I was surprised to find that a floating-point-to-integer conversion rounded up instead of truncating the fractional part. Here is some sample code, compiled using Clang, that reproduces that behavior:
double a = 1.12; // 1.1200000000000001 * 2^0
double b = 1024LL * 1024 * 1024 * 1024 * 1024; // 1 * 2^50
double c = a * b; // 1.1200000000000001 * 2^50
long long d = c; // 1261007895663739
Using exact math, the floating-point value represents
1.1200000000000001 * 2^50 = 1261007895663738.9925899906842624
I was expecting the resulting integer to be 1261007895663738
due to truncation but it is actually 1261007895663739
. Why?
Assuming IEEE 754 double precision, 1.12 is exactly
1.12000000000000010658141036401502788066864013671875
Written in binary, its significand is exactly:
1.0001111010111000010100011110101110000101000111101100
Note the last two zeros are intentional, since it's what you get with double precision (1 bit before fraction separator, plus 52 fractional bits).
So, if you shift by 50 places, you'll get an integer value
100011110101110000101000111101011100001010001111011.00
or in decimal
1261007895663739
when converting to long long, no truncation/rounding occurs, the conversion is exact.