Search code examples
pythonpandasdrop-duplicates

how to find list of columns with same values in a dataframe in python


i am trying to find list of columns in a data frame with same values in columns. there is a package in R whichAreInDouble, trying implement that in python.

df  =   
a b c d e f g h i   
1 2 3 4 1 2 3 4 5  
2 3 4 5 2 3 4 5 6  
3 4 5 6 3 4 5 6 7

it should give me list of columns with same values like

a, e are equal
b,f are equal 
c,g are equal

Solution

  • Let's try using itertools and combinations:

    from itertools import combinations
    
    [(i, j) for i,j in combinations(df, 2) if df[i].equals(df[j])]
    

    Output:

    [('a', 'e'), ('b', 'f'), ('c', 'g'), ('d', 'h')]