Say I have 2 double
values. One of them is very large and one of them is very small.
double x = 99....9; // I don't know the possible max and min values,
double y = 0,00..1; // so just assume these values are near max and min.
If I add those values together, do I lose precision?
In other words, does the max possible double
value increase if I assign an int
value to it? And does the min possible double
value decrease if I choose a small integer part?
double z = x + y; // Real result is something like 999999999999999.00000000000001
I'm using a decimal floating point arithmetic with a precision of three decimal digits and (roughly) with the same features as the typical binary floating point arithmetic. Say you have 123.0 and 4.56. These numbers are represented by a mantissa (0<=m<1) and an exponent: 0.123*10^3 and 0.456*10^1, which I'll write as <.123e3> and <.456e1>. Adding two such numbers isn't immediately possible unless the exponents are equal, and that's why the addition proceeds according to:
<.123e3> <.123e3>
<.456e1> <.004e3>
--------
<.127e3>
You see that the necessary alignment of the decimal digits according to a common exponent produces a loss of precision. In the extreme case, the entire addend could be shifted into nothingness. (Think of summing an infinite series where the terms get smaller and smaller but would still contribute considerably to the sum being computed.)
Other sources of imprecision result from differences between binary and decimal fractions, where an exact fraction in one base cannot be represented without error using the other one.
So, in short, addition and subtraction between numbers from rather different orders of magnitude are bound to cause a loss of precision.