While writing a custom floating point number parser (for speed reasons) and checking the precision against strtod
(that I assume to be extremely accurate) I found that sometimes the naive approach of using
number = (int_part + dec_part/pow(10., no_of_decs)) * pow(10., expo)
seems to be actually "more accurate" (when computation is done using long double
and then result converted back to a double) than strtod
result and that is surprising.
Do official IEEE754 parsing rules actually mandate a less accurate result?
For example with the string
708530856168225829.3221614e9
the naive computation gives
7.08530856168225761e+26
that seems closer than result of strtod
7.08530856168225898e+26
to the "theoretical" result (that cannot be represented exactly by a 64-bit double
)
7.085308561682258293221614e+26
(experiments were done with g++ (GCC) 10.2.0
and clang++ 11.1.0
on Arch linux, and they both agree on ...898e+26 for strtod
and ...761e+26
for naive computation)
As you note, 7.085308561682258293221614e+26 is not representable in IEEE-754 double precision (binary64). Therefore, it is not a candidate result and plays no role in determining the result.
The two numbers representable binary64 closest to 708530856168225829.3221614e9 are 708530856168225760595673088 and 708530856168225898034626560. Writing out the original fully and lining them up for inspection with original in the middle, we have:
708530856168225760595673088 representable value below original 708530856168225829322161400 original number 708530856168225898034626560 representable value above original
Subtracting gives the absolute differences between the lower and the original and between the original and the higher:
68726488312 distance to lower 68712465160 distance to higher
and therefore the higher number, 708530856168225898034626560, is closer to the original. This is in fact the result you report, and therefore the software is behaving correctly.
Observe that it is a mistake to think of binary64 in decimal without all significant digits. Writing out the partial decimal numerals as we did the full numbers above, we have:
7.08530856168225761e+26 proposed result 7 08530856168225829.3221614e9 original number 7.08530856168225898e+26 reported result of strtod
with differences:
68322161400 distance to lower 68677838600 distance to higher
Thus, rounding the actual values of the floating-point numbers to decimal numerals without all the digits introduced errors and portrayed incorrect values. Binary floating-point numbers are not and do not represent decimal numerals, and displaying them without all significant digits shows incorrect values.