Search code examples
c++doublefloating-accuracydouble-precisionatof

Arithmetic error with double c++


I have noticed a small error on some arithmetic calculations using double. It is really weird, there's always a small error and/or an extra significant digit.

First I am using atof to convert a number that has two significant digits that I am reading from a text file (then I record them on a vector):

 // Puts into vector
  double ask_file, bid_file; // Values of ask and bid from file
  double cur_conversion = 0.16;
  ask_file = cur_conversion*atof(values[0].c_str()); 
  bid_file = cur_conversion*atof(values[1].c_str()); 

Then I am doing the arithmetic (from other class, two different objects):

diff = OKC->bid_val() - BV->ask_val(); // diff
diff2 = OKC->ask_val() - BV->bid_val(); // diff2

This is the output:

BV Askfile: 245.267 Bidfile: 245.078 
OKC Askfile: 248.82 Bidfile: 248.73 
diff: 3.4628 diff2: 3.7416

As you can see, there's an error on both calculations. diff = 3.463 and NOT 3.4628. And diff2 = 3.742 and NOT 3.7416.

Do you know what's going on??


Solution

  • The problem is that it is in general impossible to represent fractional decimal values exactly using binary floating point numbers. For example, 0.1 is represented as 1.000000000000000055511151231257827021181583404541015625E-1 when using double (you can use this online analyzer to determine the values). When computing with these rounded values the number of necessary binary digits will exceed those which can be represented and the value will be further rounded, introducing more error. Of course, all this is covered in Goldberg's paper pointed to by the comment of Ed Heal.

    There are a number of alternative representation you can use to compute exactly with decimal values. Unless the representation uses an arbitrary sized representation it will be exactly only within some range of values. Typical choices are:

    1. Using a big integer representation together with a suitable decimal scaling.
    2. Strings (or BCDs) of digits.
    3. A fixed point representation which is basically just an integer together with a fixed decimal exponent where the exponent is implicit in the fixed point type (or, e.g., a template argument).
    4. Instead of using binary floating point you'd use decimal floating points. Floating points are just a representation of a sign, a significand, and an exponent with the value being computed as (-1)sign * significand * baseexponent. double uses a base of 2 but for decimal computations you'd use base 10.
    5. Using two big integers you could represent the value as a rational number.
    6. There are a couple of other choices but the above list is what I'd consider to be practical options.

    Depending on the choice of implementation different operations are more or less easy to implement and the exact operations vary. For example, except for the representation using rational operations divisions will always be rounded when the divisor cannot be represented as a product of 2 and 5.

    Which representation would work best for your application depends on your needs. If you have only trade prices in ranges typical for equities, a fixed point representation may work. If you need to cover all sorts of values you can encounter in finance, e.g., national debts as well as interest rates you'd need more than 64 bits for your fixed point representation and decimal floating point representation may be the better representation. Depending on whether you need to transfer and/or store the values a fixed size representation may not be required in which case the other representations may be a reasonable choice.