Search code examples
pythonpandasdataframe

Compare content of two pandas dataframes even if the rows are differently ordered


I have two pandas dataframes, which rows are in different orders but contain the same columns. My goal is to easily compare the two dataframes and confirm that they both contain the same rows.

I have tried the "equals" function, but there seems to be something I am missing, because the results are not as expected:

df_1 = pd.DataFrame({1: [10,15,30], 2: [20,25,40]})
df_2 = pd.DataFrame({1: [30,10,15], 2: [40,20,25]})
df_1.equals(df_2)

I would expect that the outcome returns True, because both dataframes contain the same rows, just in a different order, but it returns False.


Solution

  • You can specify columns for sorting in DataFrame.sort_values - in my solution sorting by all columns and DataFrame.reset_index with drop=True for default indices in both DataFrames:

    df11 = df_1.sort_values(by=df_1.columns.tolist()).reset_index(drop=True)
    df21 = df_2.sort_values(by=df_2.columns.tolist()).reset_index(drop=True)
    print (df11.equals(df21))
    True