Search code examples
c++cdoubleoverflowunderflow

How to use double to be more secure and precise?


Statements in each case are mathematically equivalent. My question is which one is better to choose while coding. Which part of code may cause overflow for some ranges of variables, while the other doesn't have overflow for the same ranges. Which part of code is more precise and why?

double x, y, z;

//case 1
x = (x * y) * z;
x *= y * z;

//case 2
z = x + x*y;
z = x * ( 1.0 + y);

//case 3
y = x/5.0;
y = x*0.2;

Solution

  • // Case 1
    x = (x * y) * z;
    x *= y * z;
    
    // Case 2
    z = x + x*y;
    z = x * ( 1.0 + y);
    
    // Case 3
    y = x/5.0;
    y = x*0.2;
    

    Case 1: x *= y * z; is like x = x * (y * z); so this case stresses the evaluation order. Should either sub-product exceed computation range and convert to INF or 0.0 or a sub-normal, the final product would significantly be affected depending on order. OTOH, intermediate math may be performed at a wider FP type. Search for FLT_EVAL_METHOD. In that case the order could be irrelevant if all computation was done as long double.

    Case 2: The 2 forms are slightly different. The 2nd is numerically more stable as the addition/subtraction uses exact values: 1, y versus the first x, x*y, x*y potentially being a rounded answer. Additional/subtraction is prone to draconian precision loss - in this case when y is near -1.0. As case 1, wider intermediate math helps, but the 2nd form is still better.

    C11 (C99?) offer fma(double x, double y, double z) and using fma(x, y, x) would be another good alternative.

    The fma functions compute (x × y) + z, rounded as one ternary operation: they compute the value (as if) to infinite precision and round once to the result format, according to the current rounding mode. A range error may occur.

    Case 3:

    The "trick" here is double 0.2 the same as mathematical 0.2? Typically it is not - yet they are close. Yet an optimizing compile could 1 ) treat them as the same or 2) or as in case 1, use wider math. Then the result is the same for both lines of code.

    Otherwise: depending on rounding mode, the two forms may exhibit a difference in the lest bit (ULP). With a weak compiler, recommend /5.0

    Division by 5.0 is more accurate than multiplication by an approximate 0.2. But coded either way, a smart compiler may use do a wide multiplication for both.