In a test case we are using np.testing.assert_allclose
to determine whether two data sources agree with each other on the mean. But despite having the same the data in a different the order, the computed means are slightly different. Here is a the shortest working example:
import numpy as np
x = np.array(
[[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
dtype=np.float32,
)
y = np.array(
[[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
dtype=np.float32,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))
np.testing.assert_allclose(z.mean(0), y.mean(0))
np.testing.assert_allclose(x.mean(0), y.mean(0))
with Python 3.10.6 and NumPy 1.24.2, gives the following output:
X mean [0.58459276 0.8050803 ]
Y mean [0.5845928 0.8050803]
Z [[0.5224021 0.8526993 ]
[0.70609194 0.7081201 ]
[0.6045113 0.7965965 ]
[0.5053657 0.86290526]]
Z mean [0.5845928 0.8050803]
Traceback (most recent call last):
File "/home/nuric/semafind-db/scribble.py", line 19, in <module>
np.testing.assert_allclose(x.mean(0), y.mean(0))
File "/home/nuric/semafind-db/.venv/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1592, in assert_allclose
assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/nuric/semafind-db/.venv/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Mismatched elements: 1 / 2 (50%)
Max absolute difference: 5.9604645e-08
Max relative difference: 1.0195925e-07
x: array([0.584593, 0.80508 ], dtype=float32)
y: array([0.584593, 0.80508 ], dtype=float32)
A solution is to reduce the tolerance for the assertion but any ideas why this might be happening?
You should use np.float64
to get more precision, np.float32
is suited for numbers with numbers up to 3 decimal places in my experience. This code will work:
import numpy as np
x = np.array(
[[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
dtype=np.float64,
)
y = np.array(
[[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
dtype=np.float64,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))
np.testing.assert_allclose(z.mean(0), y.mean(0))
np.testing.assert_allclose(x.mean(0), y.mean(0))
Another thing you can do is increase the tolerance:
import numpy as np
x = np.array(
[[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
dtype=np.float32,
)
y = np.array(
[[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
dtype=np.float32,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))
np.testing.assert_allclose(z.mean(0), y.mean(0), rtol=1e-6)
np.testing.assert_allclose(x.mean(0), y.mean(0), rtol=1e-6)
Finally, this error happens because they sum is done in a different order in each of the 3 cases and thus there will be a slight difference in each of the numbers because they will be rounded to np.float32
. You can see that by printing more decimal places:
import numpy as np
np.set_printoptions(formatter={'float': lambda x: "{0:0.10f}".format(x)})
x = np.array(
[[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
dtype=np.float32,
)
y = np.array(
[[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
dtype=np.float32,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))
np.testing.assert_allclose(z.mean(0), y.mean(0), rtol=1e-6)
np.testing.assert_allclose(x.mean(0), y.mean(0), rtol=1e-6)
Which will print:
X mean [0.5845927596 0.8050802946]
Y mean [0.5845928192 0.8050802946]
Z [[0.5224021077 0.8526992798]
[0.7060919404 0.7081201077]
[0.6045113206 0.7965965271]
[0.5053657293 0.8629052639]]
Z mean [0.5845928192 0.8050802946]