Input:
I have an input DataFrame as shown above. The products in each category need to be grouped by the sum of their sales. From this we select the top N products(in this example,N=2) in each category and only those products from the respective categories should be displayed in the final output/dataFrame as shown below
Output:
I could try creating loops and iterating. I was hoping there would be an elegant way to do this.
pd.DataFrame({'Date':['07-08-2023','06-08-2023','08-08-2023','08-08-2023','07-08-2023','06-08-2023','06-08-2023','08-08-2023'],
'Category':['A','A','A','A','B','B','B','B'],
'Products':['abc','xyz','abc','def','nop','hjk','qrs','qrs'],
'Sales':[1,2,4,3,4,5,2,3]})
Using a groupby
to get the top N products, then a merge
:
N = 2
cols = ['Category', 'Products']
out = df.merge(df
.groupby(cols, as_index=False)['Sales'].sum()
.sort_values(by='Sales', ascending=False)
.groupby('Category', as_index=False)
[cols].head(N)
)
Output:
Date Category Products Sales
0 07-08-2023 A abc 1
1 08-08-2023 A abc 4
2 08-08-2023 A def 3
3 06-08-2023 B hjk 5
4 06-08-2023 B qrs 2
5 08-08-2023 B qrs 3