Search code examples
csyntaxfloating-pointexpressionfloating-accuracy

C fundamentals: double variable not equal to double expression?


I am working with an array of doubles called indata (in the heap, allocated with malloc), and a local double called sum.

I wrote two different functions to compare values in indata, and obtained different results. Eventually I determined that the discrepancy was due to one function using an expression in a conditional test, and the other function using a local variable in the same conditional test. I expected these to be equivalent.

My function A uses:

    if (indata[i]+indata[j] > max) hi++;

and my function B uses:

    sum = indata[i]+indata[j];
    if (sum>max) hi++;

After going through the same data set and max, I end up with different values of hi depending on which function I use. I believe function B is correct, and function A is misleading. Similarly when I try the snippet below

    sum = indata[i]+indata[j];
    if ((indata[i]+indata[j]) != sum) etc.

that conditional will evaluate to true.

While I understand that floating point numbers do not necessarily provide an exact representation, why does that in-exact representation change when evaluated as an expression vs stored in a variable? Is recommended best practice to always evaluate a double expression like this prior to a conditional? Thanks!


Solution

  • I suspect you're using 32-bit x86, the only common architecture subject to excess precision. In C, expressions of type float and double are actually evaluated as float_t or double_t, whose relationships to float and double are reflected in the FLT_EVAL_METHOD macro. In the case of x86, both are defined as long double because the fpu is not actually capable of performing arithmetic at single or double precision. (It has mode bits intended to allow that, but the behavior is slightly wrong and thus can't be used.)

    Assigning to an object of type float or double is one way to force rounding and get rid of the excess precision, but you can also just add a gratuitous cast to (double) if you prefer to leave it as an expression without assignments.

    Note that forcing rounding to the desired precision is not equivalent to performing the arithmetic at the desired precision; instead of one rounding step (during the arithmetic) you now have two (during the arithmetic, and again to drop unwanted precision), and in cases where the first rounding gives you an exact-midpoint, the second rounding can go in the 'wrong' direction. This issue is generally called double rounding, and it makes excess precision significantly worse than nominal precision for certain types of calculations.