python floating-point binary floating-accuracy ieee-754

inaccurate results for calculations using floats - Simple solution

Many questions have been asked on StackOverflow and elsewhere about Python's confusing behaviour with calculations which use floats - often returning a result which is clearly wrong by a small amount. The explanation for this is invariably linked to. A practical simple solution is not usually provided however.

It isn't just the error (which is usually negligible) - it is more the mess and inelegance of getting a result like 3.999999999999999 for a simple sum like 8.7 - 4.7.

I have written a simple solution for this, and my question is, why isn't sthg like this automatically implemented by Python behind the scenes?

The basic concept is to convert all floats into integers, to do the operation, and then convert back appropriately into a float. The difficulties explained in the above-linked doc only apply to floats, not to ints, which is why it works. Here is the code:

def justwork(x,operator,y):
    numx = numy = 0
    if "." in str(x):
        numx = len(str(x)) - str(x).find(".") -1
    if "." in str(y):
        numy = len(str(y)) - str(y).find(".") -1
    num = max(numx,numy)

    factor = 10 ** num
    newx = x * factor
    newy = y * factor

    if operator == "%":
        ans1 = x % y
        ans = (newx % newy) / factor
    elif operator == "*":
        ans1 = x * y
        ans = (newx * newy) / (factor**2)
    elif operator == "-":
        ans1 = x - y
        ans = (newx - newy) / factor
    elif operator == "+":
        ans1 = x + y
        ans = (newx + newy) / factor
    elif operator == "/":
        ans1 = x / y
        ans = (newx / newy)
    elif operator == "//":
        ans1 = x // y
        ans = (newx // newy)

    return (ans, ans1)

This is admittedly rather inelegant and could probably be improved with a bit of thought, but it gets the job done. The function returns a tuple with the correct result (by converting to integer), and the incorrect result (automatically provided). Here are examples of how this provides accurate results, as opposed to doing it normally.

#code                           #returns tuple with (correct, incorrect) result
print(justwork(0.7,"%",0.1))    #(0.0, 0.09999999999999992)
print(justwork(0.7,"*",0.1))    #(0.07, 0.06999999999999999)
print(justwork(0.7,"-",0.2))    #(0.5, 0.49999999999999994)
print(justwork(0.7,"+",0.1))    #(0.8, 0.7999999999999999)
print(justwork(0.7,"/",0.1))    #(7.0, 6.999999999999999)
print(justwork(0.7,"//",0.1))   #(7.0, 6.0)

TLDR: Essentially the question is, Why are floats stored as base 2 binary fractions (which are inherently imprecise) when they could be stored the same way as integers (which Just Work)?

Solution

Three points:

the function in the question/general method proposed, while it does avoid the problem in many cases, there are many other cases, even relatively simple ones, where it has the same problem.
there is a decimal module which always provides accurate answers (even when the justwork() function in the question fails to)
using the decimal module slows things down considerably - taking roughly 100 times longer. The default approach sacrifices accuracy to prioritise speed. [Whether making this the default is the right approach is debatable].

To illustrate these three points consider the following functions, loosely based on that in the question:

def justdoesntwork(x,operator,y):
    numx = numy = 0
    if "." in str(x):
        numx = len(str(x)) - str(x).find(".") -1
    if "." in str(y):
        numy = len(str(y)) - str(y).find(".") -1
    factor = 10 ** max(numx,numy)
    newx = x * factor
    newy = y * factor

    if operator == "+":     myAns = (newx + newy) / factor
    elif operator == "-":   myAns = (newx - newy) / factor
    elif operator == "*":   myAns = (newx * newy) / (factor**2)
    elif operator == "/":   myAns = (newx / newy)
    elif operator == "//":  myAns = (newx //newy)
    elif operator == "%":   myAns = (newx % newy) / factor

    return myAns

and

from decimal import Decimal
def doeswork(x,operator,y):
    if operator == "+":     decAns = Decimal(str(x)) + Decimal(str(y))
    elif operator == "-":   decAns = Decimal(str(x)) - Decimal(str(y))
    elif operator == "*":   decAns = Decimal(str(x)) * Decimal(str(y))
    elif operator == "/":   decAns = Decimal(str(x)) / Decimal(str(y))
    elif operator == "//":  decAns = Decimal(str(x)) //Decimal(str(y))
    elif operator == "%":   decAns = Decimal(str(x)) % Decimal(str(y))

    return decAns

and then looping through many values to find where myAns is different to decAns:

operatorlist = ["+", "-", "*", "/", "//", "%"]
for a in range(1,1000):
    x = a/10
    for b in range(1,1000):
        y=b/10
        counter = 0
        for operator in operatorlist:
            myAns, decAns = justdoesntwork(x, operator, y),  doeswork(x, operator, y)
            if (float(decAns) != myAns)   and     len(str(decAns)) < 5  :
                print(x,"\t", operator, " \t ", y, " \t=   ", decAns,  "\t\t{", myAns, "}")

=> this goes through all values to 1 d.p. from 0.1 to 99.9 - and indeed fails to find any values where myAns is different to decAns.

However if it is changed to give 2d.p. (i.e. either x = a/100 or y = b/100), then many examples appear. For example, 0.1+1.09 - this can easily be checked by typing in the console ((0.1*100)+(1.09*100)) / (100), which uses the basic method of the question, and which returns 1.1900000000000002 instead of 1.19. The source of the error is in 1.09*100 which returns 109.00000000000001. [Simply typing in 0.1+1.09 also gives the same error]. So the approach suggested in the question doesn't always work.

Using Decimal() however returns the correct answer: Decimal('0.1')+Decimal('1.09') returns Decimal('1.19').

[Note: Don't forget to enclose the 0.1 and 1.09 with quotes. If you don't, Decimal(0.1)+Decimal(1.09) returns Decimal('1.190000000000000085487172896') - because it starts with a float 0.1 which is stored inaccurately, and then converts that to Decimal - GIGO. Decimal() has to be fed a string. Taking a float, converting it to a string, and from there to Decimal, does seem to work though, the problem is only when going directly from float to Decimal].

In terms of time cost, run this:

import timeit
operatorlist = ["+", "-", "*", "/", "//", "%"]

for operator in operatorlist:
    for a in range(1,10):
        a=a/10
        for b in range(1,10):
            b=b/10
            
            DECtime  = timeit.timeit("Decimal('" +str(a)+ "') " +operator+ " Decimal('" +str(b)+ "')", setup="from decimal import Decimal")
            NORMtime = timeit.timeit(str(a) +operator+ str(b))
            timeslonger = DECtime // NORMtime
            print("Operation:  ", str(a) +operator +str(b) , "\tNormal operation time: ", NORMtime, "\tDecimal operation time: ", DECtime, "\tSo Decimal operation took ", timeslonger, " times longer")

This shows that Decimal operations consistently take around 100 times longer, for all the operators tested.

[Including exponentiation in the list of operators shows that exponentiation can take 3000 - 5000 times longer. However this is partly because Decimal() evaluates to far greater precision than normal operations - Decimal() default precision is 28 places - Decimal("1.5")**Decimal("1.5") returns 1.837117307087383573647963056, whereas 1.5**1.5 returns 1.8371173070873836. If you limit b to whole numbers by replacing b=b/10 with b=float(b) (which will prevent results with high SFs), the Decimal calculation takes around 100 times longer, as with other operators].

It could still be argued that the time cost is only significant for users performing billions of calculations, and most users would prioritise getting intelligible results over a time difference which is pretty insignificant in most modest applications.