Search code examples
pythonpandasequalitynan

Pandas DataFrames with NaNs equality comparison


In the context of unit testing some functions, I'm trying to establish the equality of 2 DataFrames using python pandas:

ipdb> expect
                            1   2
2012-01-01 00:00:00+00:00 NaN   3
2013-05-14 12:00:00+00:00   3 NaN

ipdb> df
identifier                  1   2
timestamp
2012-01-01 00:00:00+00:00 NaN   3
2013-05-14 12:00:00+00:00   3 NaN

ipdb> df[1][0]
nan

ipdb> df[1][0], expect[1][0]
(nan, nan)

ipdb> df[1][0] == expect[1][0]
False

ipdb> df[1][1] == expect[1][1]
True

ipdb> type(df[1][0])
<type 'numpy.float64'>

ipdb> type(expect[1][0])
<type 'numpy.float64'>

ipdb> (list(df[1]), list(expect[1]))
([nan, 3.0], [nan, 3.0])

ipdb> df1, df2 = (list(df[1]), list(expect[1])) ;; df1 == df2
False

Given that I'm trying to test the entire of expect against the entire of df, including NaN positions, what am I doing wrong?

What is the simplest way to compare equality of Series/DataFrames including NaNs?


Solution

  • You can use assert_frame_equals with check_names=False (so as not to check the index/columns names), which will raise if they are not equal:

    In [11]: from pandas.testing import assert_frame_equal
    
    In [12]: assert_frame_equal(df, expected, check_names=False)
    

    You can wrap this in a function with something like:

    try:
        assert_frame_equal(df, expected, check_names=False)
        return True
    except AssertionError:
        return False
    

    In more recent pandas this functionality has been added as .equals:

    df.equals(expected)