Search code examples
c++floating-pointdoublerounding

What does "double + 1e-6" mean?


The result of this cpp is 72.740, but the answer should be like 72.741

mx = 72.74050000;
printf("%.3lf \n", mx);

So I found the solution on website, and it told me to add "+1e-7" and it works

mx = 72.74050000;
printf("%.3lf \n", mx + 1e-7);

but I dont know the reason in this method, can anyone explain how it works?

And I also try to print it but nothing special happens..., and it turn out to be 72.7405

mx = 72.74050003;
cout << mx + 1e-10;

Solution

  • To start, your question contains an incorrect assumption. You put 72.7405 (let's assume it's precise) on input and expect 72.741 on output. So, you assume that rounding in printf will select higher candidate of possible twos. Why?

    Well, one could consider this is your task, according to some rules (e.g. fiscal norms for rounding in bills, in taxation, etc.) - this is usual. But, when you use standard de facto floating of C/C++ on x86, ARM, etc., you should take the following specifics into account:

    1. It is binary, not decimal. As result, all values you showed in your example are kept with some error.
    2. Standard library tends to use standard rounding, unless forced to use another method.

    The second point means that default rounding in C floating is round-to-nearest-ties-to-even (or, shortly, half-to-even). With this rounding, 72.7405 will be rounded to 72.740, not 72.741 (but, 72.7415 will be rounded to 72.742). To ask for rounding 72.7405 -> 72.741, you should have installed another rounding mode: round-to-nearest-ties-away-from-zero (shortly: round-half-away). This mode is request, to refer to, in IEEE754 for decimal arithmetic. So, if you used true decimal arithmetic, it would suffice.

    (If we don't allow negative numbers, the same mode might be treated as half-up. But I assume negative numbers are not permitted in financial accounting and similar contexts.)

    But, the first point here is more important: inexactness of representation of such values can be multiplied by operations. I repeat your situation and a proposed solution with more cases:

    Code:

    #include <stdio.h>
    int main()
    {
      float mx;
      mx = 72.74050000;
      printf("%.6lf\n", mx);
      printf("%.3lf\n", mx + 1e-7);
      mx *= 3;
      printf("%.6lf\n", mx);
      printf("%.3lf\n", mx + 1e-7);
    }
    

    Result (Ubuntu 20.04/x86-64):

    72.740501
    72.741
    218.221497
    218.221
    

    So you see that just multiplying of your example number by 3 resulted in situation that the compensation summand 1e-7 gets not enough to force rounding half-up, and 218.2215 (the "exact" 72.7405*3) is rounded to 218.221 instead of desired 218.222. Oops, "Directed by Robert B. Weide"...

    How the situation could be fixed? Well, you could start with a stronger rough approach. If you need rounding to 3 decimal digits, but inputs look like having 4 digits, add 0.00005 (half of least significant digit in your results) instead of this powerless and sluggish 1e-7. This will definitely move half-voting values up.

    But, all this will work only if result before rounding have error strictly less than 0.00005. If you have cumbersome calculations (e.g. summing hundreds of values), it's easy to get resulting error more than this threshold. To avoid such an error, you would round intermediate results often (ideally, each value).

    And, the last conclusion leads us to the final question: if we need to round each intermediate result, why not just migrate to calculations in integers? You have to keep intermediate results up to 4 decimal digits? Scale by 10000 and do all calculations in integers. This will also aid in avoiding silent(*) accuracy loss with higher exponents.

    (*) Well, IEEE754 requires raising "inexact" flag, but, with binary floating, nearly any operation with decimal fractions will raise it, so, useful signal will drown in sea of noise.

    The final conclusion is the proper answer not to your question but to upper task: use fixed-point approaches. The approach with this +1e-7, as I showed above, is too easy to fail. No, don't use it, no, never. There are lots of proper libraries for fixed-point arithmetic, just pick one and use.

    (It's also interesting why %.6f resulted in printing 72.740501 but 218.221497/3 == 72.740499. It suggests "single" floating (float in C) gets too inaccurate here. Even without this wrong approach, using double will postpone the issue, masking it and disguising as a correct way.)