I am trying to compare two dataframes using the testing library of pandas. I do not want the values to be exactly the same for the test to pass, so I am using atol parameter. Atol specifies the absoulte tolerance allowed. However, when the values to compare become high, the test passes even if the tolerance threshold is exceeded.
I hereafter provide two reproducible example:
import pandas as pd
import pandas.testing
df1 = pd.DataFrame([42])
df2 = pd.DataFrame([41])
#This test fails as expected
pd.testing.assert_frame_equal(df1, df2, check_exact=False, atol=0.1)
df1 = pd.DataFrame([2006642])
df2 = pd.DataFrame([2006641])
pd.testing.assert_frame_equal(df1, df2, check_exact=False, atol=0.1)
#this test passes, but it should not
Can anyone explain why this happens? Have I misunderstood how atol works?
It turns out that atol parameter is not used alone but in conjunction with rtol, which defaults to a value (1e-05), hence why the bigger values I was comparing made the test pass.
absolute(a - b) <= (atol + rtol * absolute(b))
In order to obtain the expected result, rtol also needs to be set. In my case, in order to use exclusively atol, I need to set rtol to 0.
df1 = pd.DataFrame([2006642])
df2 = pd.DataFrame([2006641])
#this test now fails as expected
pd.testing.assert_frame_equal(df1, df2, check_exact=False, atol=0.1, rtol=0)
Credit to the answer in is numpy isclose function returning bad answer?