I would like to display the duplicates of a dataframe in order to get a better understanding. I would like to groupby the duplicated rows
This example hopefully clarifies what I want to do. Assume we have given the dataframe below
CC BF FA WC Strength
1 2 3 4 1
2 3 4 5 6
1 2 3 4 8
1 2 3 4 4
2 3 4 5 7
Here rows 1,3,4 and 2,5 are duplicates after removing Strength. I would like to get a new dataframe that displays
CC BF FA WC Strength_min Strength_max Count
1 2 3 4 1 8 3
2 3 4 5 6 7 2
You need a custom groupby.agg
with the output from Index.difference
as grouper:
(df.groupby(list(df.columns.difference(['Strength'], sort=False)))['Strength']
.agg(**{'Strength_min': 'min', 'Strength_max': 'max', 'Count': 'count'})
.reset_index()
)
Output:
CC BF FA WC Strength_min Strength_max Count
0 1 2 3 4 1 8 3
1 2 3 4 5 6 7 2