Search code examples
pythonpandasdataframeinversion

Pandas find duplicates with reversed values between columns


What is the fastest way to find duplicates where value from Column A have been reversed with value from Column B?

For example, if I have a DataFrame with :

     Column A           Column B
0           C                  P
1           D                  C
2           L                  G
3           A                  D
4           B                  P
5           B                  G
6           P                  B
7           J                  T
8           P                  C
9           J                  T

The result will be :

     Column A           Column B
0           C                  P
8           P                  C
4           B                  P
6           P                  B

I tried :

df1 = df
df2 = df
for i in df2.index:
     res = df1[(df1['Column A'] == df2['Column A'][i]) & (df1['Column B'] == df2['Column B'][i])]

But this is very slow and it iterates over the same values...


Solution

  • Use merge with renamed columns DataFrame:

    d = {'Column A':'Column B','Column B':'Column A'}
    df = df.merge(df.rename(columns=d))
    print (df)
      Column A Column B
    0        C        P
    1        B        P
    2        P        B
    3        P        C