Search code examples
pythonpandasdataframeindexingrows

How to compare two data frame and get the unmatched rows using python?


I have two data frames, df1 and df2. Now, df1 contains 6 records and df2 contains 4 records. I want to get the unmatched records out of it. I tried it but getting an error ValueError: Can only compare identically-labelled DataFrame objects I guess this is due to the length of df as the df1 has 6 and df2 has 4 but how do I compare them both and get the unmatched rows?

code

df1=
  a  b  c
0 1  2  3
1 4  5  6
2 3  5  5
3 5  6  7
4 6  7  8
5 6  6  6


df2 =
  a  b  c
0 3  5  5
1 5  6  7
2 6  7  8
3 6  6  6

index = (df != df2).any(axis=1)
df3 = df.loc[index]

which gives:

ValueError: Can only compare identically-labelled DataFrame objects

Expected output:

 a  b  c
0 1  2  3
1 4  5  6

I know that the error is due to the length but is there any way where we can compare two data frames and get the unmatched records out of it?


Solution

  • MultiIndex.from_frame + isin

    We can use MultiIndex.from_frame on both df1 and df2 to create the corresponding multiindices, then use isin to test the membership of the index created from df1 in index created from df2 to create a boolean mask which can be then used to filter the non matching rows.

    i1 = pd.MultiIndex.from_frame(df1)
    i2 = pd.MultiIndex.from_frame(df2)
    df1[~i1.isin(i2)]
    

    Result

       a  b  c
    0  1  2  3
    1  4  5  6