Many questions have been asked on StackOverflow and elsewhere about Python's confusing behaviour with calculations which use floats - often returning a result which is clearly wrong by a small amount. The explanation for this is invariably linked to. A practical simple solution is not usually provided however.
It isn't just the error (which is usually negligible) - it is more the mess and inelegance of getting a result like 3.999999999999999
for a simple sum like 8.7 - 4.7
.
I have written a simple solution for this, and my question is, why isn't sthg like this automatically implemented by Python behind the scenes?
The basic concept is to convert all floats into integers, to do the operation, and then convert back appropriately into a float. The difficulties explained in the above-linked doc only apply to floats, not to ints, which is why it works. Here is the code:
def justwork(x,operator,y):
numx = numy = 0
if "." in str(x):
numx = len(str(x)) - str(x).find(".") -1
if "." in str(y):
numy = len(str(y)) - str(y).find(".") -1
num = max(numx,numy)
factor = 10 ** num
newx = x * factor
newy = y * factor
if operator == "%":
ans1 = x % y
ans = (newx % newy) / factor
elif operator == "*":
ans1 = x * y
ans = (newx * newy) / (factor**2)
elif operator == "-":
ans1 = x - y
ans = (newx - newy) / factor
elif operator == "+":
ans1 = x + y
ans = (newx + newy) / factor
elif operator == "/":
ans1 = x / y
ans = (newx / newy)
elif operator == "//":
ans1 = x // y
ans = (newx // newy)
return (ans, ans1)
This is admittedly rather inelegant and could probably be improved with a bit of thought, but it gets the job done. The function returns a tuple with the correct result (by converting to integer), and the incorrect result (automatically provided). Here are examples of how this provides accurate results, as opposed to doing it normally.
#code #returns tuple with (correct, incorrect) result
print(justwork(0.7,"%",0.1)) #(0.0, 0.09999999999999992)
print(justwork(0.7,"*",0.1)) #(0.07, 0.06999999999999999)
print(justwork(0.7,"-",0.2)) #(0.5, 0.49999999999999994)
print(justwork(0.7,"+",0.1)) #(0.8, 0.7999999999999999)
print(justwork(0.7,"/",0.1)) #(7.0, 6.999999999999999)
print(justwork(0.7,"//",0.1)) #(7.0, 6.0)
TLDR: Essentially the question is, Why are floats stored as base 2 binary fractions (which are inherently imprecise) when they could be stored the same way as integers (which Just Work)?
Three points:
decimal
module which always provides accurate answers (even when the justwork()
function in the question fails to)decimal
module slows things down considerably - taking roughly 100 times longer. The default approach sacrifices accuracy to prioritise speed. [Whether making this the default is the right approach is debatable].To illustrate these three points consider the following functions, loosely based on that in the question:
def justdoesntwork(x,operator,y):
numx = numy = 0
if "." in str(x):
numx = len(str(x)) - str(x).find(".") -1
if "." in str(y):
numy = len(str(y)) - str(y).find(".") -1
factor = 10 ** max(numx,numy)
newx = x * factor
newy = y * factor
if operator == "+": myAns = (newx + newy) / factor
elif operator == "-": myAns = (newx - newy) / factor
elif operator == "*": myAns = (newx * newy) / (factor**2)
elif operator == "/": myAns = (newx / newy)
elif operator == "//": myAns = (newx //newy)
elif operator == "%": myAns = (newx % newy) / factor
return myAns
and
from decimal import Decimal
def doeswork(x,operator,y):
if operator == "+": decAns = Decimal(str(x)) + Decimal(str(y))
elif operator == "-": decAns = Decimal(str(x)) - Decimal(str(y))
elif operator == "*": decAns = Decimal(str(x)) * Decimal(str(y))
elif operator == "/": decAns = Decimal(str(x)) / Decimal(str(y))
elif operator == "//": decAns = Decimal(str(x)) //Decimal(str(y))
elif operator == "%": decAns = Decimal(str(x)) % Decimal(str(y))
return decAns
and then looping through many values to find where myAns
is different to decAns
:
operatorlist = ["+", "-", "*", "/", "//", "%"]
for a in range(1,1000):
x = a/10
for b in range(1,1000):
y=b/10
counter = 0
for operator in operatorlist:
myAns, decAns = justdoesntwork(x, operator, y), doeswork(x, operator, y)
if (float(decAns) != myAns) and len(str(decAns)) < 5 :
print(x,"\t", operator, " \t ", y, " \t= ", decAns, "\t\t{", myAns, "}")
=> this goes through all values to 1 d.p. from 0.1 to 99.9 - and indeed fails to find any values where myAns
is different to decAns
.
However if it is changed to give 2d.p. (i.e. either x = a/100
or y = b/100
), then many examples appear. For example, 0.1+1.09
- this can easily be checked by typing in the console ((0.1*100)+(1.09*100)) / (100)
, which uses the basic method of the question, and which returns 1.1900000000000002
instead of 1.19
. The source of the error is in 1.09*100
which returns 109.00000000000001
. [Simply typing in 0.1+1.09
also gives the same error]. So the approach suggested in the question doesn't always work.
Using Decimal() however returns the correct answer: Decimal('0.1')+Decimal('1.09')
returns Decimal('1.19')
.
[Note: Don't forget to enclose the 0.1 and 1.09 with quotes. If you don't, Decimal(0.1)+Decimal(1.09)
returns Decimal('1.190000000000000085487172896')
- because it starts with a float 0.1 which is stored inaccurately, and then converts that to Decimal - GIGO. Decimal() has to be fed a string. Taking a float, converting it to a string, and from there to Decimal, does seem to work though, the problem is only when going directly from float to Decimal].
In terms of time cost, run this:
import timeit
operatorlist = ["+", "-", "*", "/", "//", "%"]
for operator in operatorlist:
for a in range(1,10):
a=a/10
for b in range(1,10):
b=b/10
DECtime = timeit.timeit("Decimal('" +str(a)+ "') " +operator+ " Decimal('" +str(b)+ "')", setup="from decimal import Decimal")
NORMtime = timeit.timeit(str(a) +operator+ str(b))
timeslonger = DECtime // NORMtime
print("Operation: ", str(a) +operator +str(b) , "\tNormal operation time: ", NORMtime, "\tDecimal operation time: ", DECtime, "\tSo Decimal operation took ", timeslonger, " times longer")
This shows that Decimal operations consistently take around 100 times longer, for all the operators tested.
[Including exponentiation in the list of operators shows that exponentiation can take 3000 - 5000 times longer. However this is partly because Decimal() evaluates to far greater precision than normal operations - Decimal() default precision is 28 places - Decimal("1.5")**Decimal("1.5")
returns 1.837117307087383573647963056
, whereas 1.5**1.5
returns 1.8371173070873836
. If you limit b
to whole numbers by replacing b=b/10
with b=float(b)
(which will prevent results with high SFs), the Decimal calculation takes around 100 times longer, as with other operators].
It could still be argued that the time cost is only significant for users performing billions of calculations, and most users would prioritise getting intelligible results over a time difference which is pretty insignificant in most modest applications.