In what follows, IEEE-754 Double-precision floating-point format is taken for granted to be used.
Python: "...almost all machines use IEEE 754 binary floating-point arithmetic, and almost all platforms map Python floats to IEEE 754 binary64 'double precision' values."
C++: associates double
with IEEE-754 double-precision.
Machine epsilon is s.t. fl(1+epsilon)>1.
Using double precision, formally epsilon = 2^-52, but mostly implemented (because of rounding-to-nearest) as epsilon = 2^-53.
In Python (Spyder), when I do:
i = 1.0 + 2**-53
print(i)
>> 1.0
C++ version:
#include <iostream>
#include <cmath>
#include <iomanip>
int main() {
double i = std::pow(2.0, -53);
double j = 1.0 + i;
std::cout << j;
return 0;
}
>> 1.0
Furthermore, when I do in Python:
i = 1.0 + 2**-52 + 2**-53
print(i)
>> 1.0000000000000004
C++ version:
#include <iostream>
#include <cmath>
#include <iomanip>
int main() {
double i = std::pow(2.0, -52);
double j = 1.0 + i;
double k = j + std::pow(2.0, -53);
std::cout << std::setprecision(17) << k;
return 0;
}
>> 1.0000000000000004
where the order of addition is from left to right so that the magic of the non-associativity does not come into play, i.e.
i = 1.0 + 2**-52 + 2**-53 <=> i = (1.0 + 2**-52) + 2**-53
.
The weird (to me) thing happens here. Firstly, 1 + 2**-52 is stored (in registers) as exactly the Double Precision format equivalent of 1 + 2*epsilon, i.e.
0 + 01111111111 + 000...01
Moreover, 2**-53 is stored as:
0 + 01111001010 + 000...00
If we write these in the form (mantissa)*(2**exponent)
, 1 + 2**-52
is (1.0000...01)*(2**0)
, and 2**-53
is (1.0000...00)*(2**-53)
, where the mantissa (1.xxxx...xx) length is 53 (including implicit 1 bit). A bitwise addition requires the same exponent, thus I shift the mantissa of the smaller one (i.e. 2**-53) so that it has exponent 0:
(1.0000...00)*(2**-53) -> (0.|52 zeros here|10000...00)*(2**0)
So that the addition is like:
1.0000...010 (51 zero digits after the radix, then a '1' and '0' making a total of 53 digits)
0.0000...001 (52 zero digits after the radix + '1' digit making a total of 53 digits)
+
------------
1.0000...011 (53 digits after the radix)
Now, mantissa should be 53 in total, but it is 54 above, thus it should be rounded.
My question begins here: Is the reason why both of these programming languages give 1.0000000000000004
output when I do 1.0 + 2**-52 + 2**-53
because tie-to-even rule is implemented so that 1.0000...011
is rounded to 1.0000...10
, which is esentially 1.0000000000000004
up to 16 digits precision? or is it (completely) something else & I made a mistake in the calculations etc.?
Sorry if it seems as a bit of overkill or overthinking on a simple subject like this, however, it bothers me for days and I could not figure out the reason why or could not verify my thoughts. Any answer and comment is appreciated.
Yes, it's entirely about round-to-nearest/even. Python's float.hex()
can show you the bits directly:
>>> 1.0 + 2**-52 + 2**-53
1.0000000000000004
>>> _.hex()
'0x1.0000000000002p+0'
Although Python and C have little to do with this: it's almost certainly a result of how your CPU/FPU implement float addition (they almost certainly implement 754-style nearest/even rounding in hardware by default).
``
Note too that associativity doesn't matter in this specific example. 1.0 + 2**-52
and 2**-52 + 2**-53
are both exactly representable on their own.