Say I have following dataframe:
d = {'col1': ["8","8","8","8","8","2","2","2","2","3","3"], 'col2': ['a', 'b','b','b','b','a','b','a','a','a','b'],
'col3': ['m', 'n','z','b','a','ac','b1','ad','a1','a','b1'],'col4': ['m', 'n','z','b1','a','ac1','b31','a1d','3a1','a3','b1']}
test = pd.DataFrame(data=d)
In order to sort each grouped item with count
, I could do the following:
test.groupby(["col1",'col2'])['col4'].count().reset_index(name="count").sort_values(["col1","count"],ascending=[True,False]).
It returns this table:
However, I want the group with 8
in col1
to be the first item because this particular group has the highest count (i.e., 4).
How do I achieve this?
Edit: This is the expected output:
col1 col2 count
8 b 4
8 a 1
2 a 3
2 b 1
3 a 1
3 b 1
The expected output is unclear, but assuming you want to sort the rows within each group by decreasing orders of count, and also the groups with each other by decreasing order of the max (or total) count.
(test.groupby(["col1",'col2'])['col4'].count()
.reset_index(name="count")
# using the max count per group, for the total use transform('sum')
.assign(maxcount=lambda d: d.groupby('col1')['count'].transform('max'))
.sort_values(['maxcount', 'count'], ascending=False)
.drop(columns='maxcount')
)
Output:
col1 col2 count
5 8 b 4
4 8 a 1
0 2 a 3
1 2 b 1
2 3 a 1
3 3 b 1