For example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0.0,1.2,0.2)
y = np.arange(0.0,1.2,0.2)
labels = np.arange(0.0,1.2,0.2)
plt.plot(x, y)
plt.xticks(x, labels)
plt.show()
I had to use np.around(np.arange(0.0, 1.2, 0.2),1)
to avoid it, but if I just run np.arange(0.0,1.2,0.2)
it gives: array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])
, why is it different?
Also, the y axis do not use 0.60...01
as label, which is also weird.
This issue is due to IEEE 754 float precision, and I think it should have a good solution to round decimal numbers.
The same floating point representation that represents 0.6, matches a whole interval of real numbers. So all real numbers from 0.59999999999999993 to 0.60000000000000003 share the same float64 representation.
Just try it:
import struct
struct.pack('d', 0.59999999999999992)
struct.pack('d', 0.59999999999999993)
struct.pack('d', 0.59999999999999994)
struct.pack('d', 0.59999999999999995)
struct.pack('d', 0.59999999999999996)
struct.pack('d', 0.59999999999999997)
struct.pack('d', 0.59999999999999998)
struct.pack('d', 0.59999999999999999)
struct.pack('d', 0.60000000000000000)
struct.pack('d', 0.60000000000000001)
struct.pack('d', 0.60000000000000002)
struct.pack('d', 0.60000000000000003)
struct.pack('d', 0.60000000000000004)
As you can see, all, but the first and last number, have the same representation.
But that is not the only problem. Because, that float64 object that represents any real between 0.59993 and 0.6003, is represented by python with the "roundest" number of that interval. Namely, 0.6
. This is why when you type 0.6
in your python interpreter, it doesn't reply 0.59999999999999993 nor 0.59999999999999999. (Or, that would have been an easiest way to test that struct
— but I wanted to introduce struct
—, why when you type 0.59999999999999994, python replies 0.6, but when you type 0.59999999999999992, it says 0.5999999999999999)
The problem is that 0.2 neither have an exact representation.
All real numbers from 0.19999999999999998 0.20000000000000002 share the same representation. And that representation is only the exact representation of 0.20000000000000001110223024625156540423631668090820...
I know this because:
import struct
b=struct.pack('d', 0.2)
x=struct.unpack('l', b)[0]
exponent=(x>>52)&(2**11-1) # 1020 aka -3
mantissa=x&(2**52-1) # 2702159776422298
mantissa+=2**52 # Add the implicit 1 of float64
# Check, mantissa/2**52*2**-3 should be ~0.2
mantissa/2**52*2**(exponent-1023) # 0.2
# To know the rest of the digits that python float64 can't show,
# I take advantage of the infinite range of integers of python, and compute
# that times 10**50
# using exact integer operations
10**50*mantissa//(2**(52+1023-exponent))
# 20000000000000001110223024625156540423631668090820
)
Now, I you multiply that number by 3, you get 0.60000000000000003330669073875469621270895004272460...
Which is greater than 0.60000000000000003
In other words, 0.2*3 and 0.6 doesn't have the same float64 representation.
Now, when a numpy array is printed, it is a bit rounded.
np.array([1.234567890123])
⇒
array([1.23456789])
This is just a display choice of numpy (which can be tweaked, btw, with set_printoptions
). The way __repr__
method works.
You can check that
np.array([1.234567890123])[0]
⇒
1.234567890123
Which is why you didn't see the numerical error when printing the range.
All digits are there. They are just not printed by numpy array's __repr__
.
Same goes for 0.6
np.arange(0,1.2,0.2)
#array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])
np.arange(0,1.2,0.2)[3]
#0.6000000000000001
As for how to avoid it:
"0.6"
)xticks
specification. The behaviour you expect is already the default onexticks
to impose the ticks position, but do not set label, and let the default formater choose how they are printed (so choose which ticks are printed, not how)
plt.xticks(x)
import matplotlib.ticker as tk
plt.gca().xaxis.set_major_formatter(tk.FormatStrFormatter('%.2f'))
xticks
where to print labels, and with formatter
how to print them.xticks
to fix both ticks and their label (but I think that should be avoided, because that is redoing formatter job. I do that only when I need some exotic labels. Such as xticks(x, ['zero', '1/5', '40%', '3/5', '80%', 'full'])
), then pass explicit strings as label (what would be the point of redoing formatter's job, if it is to still not choose yourself how to print the non-string object you passed?)plt.xticks(x, [f'{t:.2f}' for t in x])