Search code examples
parsingfloating-pointprecisionieee-754

Why C++ strtod parses "708530856168225829.3221614e9" to 7.08530856168225898e+26 instead of 7.08530856168225761e+26?


While writing a custom floating point number parser (for speed reasons) and checking the precision against strtod (that I assume to be extremely accurate) I found that sometimes the naive approach of using

 number = (int_part + dec_part/pow(10., no_of_decs)) * pow(10., expo)

seems to be actually "more accurate" (when computation is done using long double and then result converted back to a double) than strtod result and that is surprising.

Do official IEEE754 parsing rules actually mandate a less accurate result?

For example with the string

 708530856168225829.3221614e9

the naive computation gives

 7.08530856168225761e+26

that seems closer than result of strtod

 7.08530856168225898e+26

to the "theoretical" result (that cannot be represented exactly by a 64-bit double)

 7.085308561682258293221614e+26

(experiments were done with g++ (GCC) 10.2.0 and clang++ 11.1.0 on Arch linux, and they both agree on ...898e+26 for strtod and ...761e+26 for naive computation)


Solution

  • As you note, 7.085308561682258293221614e+26 is not representable in IEEE-754 double precision (binary64). Therefore, it is not a candidate result and plays no role in determining the result.

    The two numbers representable binary64 closest to 708530856168225829.3221614e9 are 708530856168225760595673088 and 708530856168225898034626560. Writing out the original fully and lining them up for inspection with original in the middle, we have:

    708530856168225760595673088   representable value below original
    708530856168225829322161400   original number
    708530856168225898034626560   representable value above original
    

    Subtracting gives the absolute differences between the lower and the original and between the original and the higher:

                    68726488312   distance to lower
                    68712465160   distance to higher
    

    and therefore the higher number, 708530856168225898034626560, is closer to the original. This is in fact the result you report, and therefore the software is behaving correctly.

    Observe that it is a mistake to think of binary64 in decimal without all significant digits. Writing out the partial decimal numerals as we did the full numbers above, we have:

    7.08530856168225761e+26         proposed result
    7 08530856168225829.3221614e9   original number
    7.08530856168225898e+26         reported result of strtod
    

    with differences:

                     68322161400   distance to lower
                     68677838600   distance to higher
    

    Thus, rounding the actual values of the floating-point numbers to decimal numerals without all the digits introduced errors and portrayed incorrect values. Binary floating-point numbers are not and do not represent decimal numerals, and displaying them without all significant digits shows incorrect values.