Search code examples
pythonpandasgroup-byduplicates

Displaying duplicates in pandas


I would like to display the duplicates of a dataframe in order to get a better understanding. I would like to groupby the duplicated rows

This example hopefully clarifies what I want to do. Assume we have given the dataframe below


CC BF FA WC Strength
1  2  3  4   1
2  3  4  5   6
1  2  3  4   8
1  2  3  4   4
2  3  4  5   7

Here rows 1,3,4 and 2,5 are duplicates after removing Strength. I would like to get a new dataframe that displays

CC BF FA WC Strength_min Strength_max Count
1  2  3  4  1            8             3
2  3  4  5  6            7             2

Solution

  • You need a custom groupby.agg with the output from Index.difference as grouper:

    (df.groupby(list(df.columns.difference(['Strength'], sort=False)))['Strength']
       .agg(**{'Strength_min': 'min', 'Strength_max': 'max', 'Count': 'count'})
       .reset_index()
    )
    

    Output:

       CC  BF  FA  WC  Strength_min  Strength_max  Count
    0   1   2   3   4             1             8      3
    1   2   3   4   5             6             7      2