Search code examples
pythonpandasdataframeequalitydtype

Pandas df.equals() returning False on identical dataframes?


Let df_1 and df_2 be:

In [1]: import pandas as pd
   ...: df_1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
   ...: df_2 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

In [2]: df_1
Out[2]:
   a  b
0  1  4
1  2  5
2  3  6

We add a row r to df_1:

In [3]: r = pd.DataFrame({'a': ['x'], 'b': ['y']})
   ...: df_1 = df_1.append(r, ignore_index=True)

In [4]: df_1
Out[4]:
   a  b
0  1  4
1  2  5
2  3  6
3  x  y

We now remove the added row from df_1 and get the original df_1 back again:

In [5]: df_1 = pd.concat([df_1, r]).drop_duplicates(keep=False)

In [6]: df_1
Out[6]:
   a  b
0  1  4
1  2  5
2  3  6

In [7]: df_2
Out[7]:
   a  b
0  1  4
1  2  5
2  3  6

While df_1 and df_2 are identical, equals() returns False.

In [8]: df_1.equals(df_2)
Out[8]: False

Did reseach on SO but could not find a related question. Am I doing somthing wrong? How to get the correct result in this case? (df_1==df_2).all().all() returns True but not suitable for the case where df_1 and df_2 have different length.


Solution

  • Use pandas.testing.assert_frame_equal(df_1, df_2, check_dtype=True), which will also check if the dtypes are the same.

    (It will pick up in this case that your dtypes changed from int to 'object' (string) when you appended, then deleted, a string row; pandas did not automatically coerce the dtype back down to less expansive dtype.)

    AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
    
    Attribute "dtype" are different
    [left]:  object
    [right]: int64