Search code examples
pythonpandasdataframegroup-byfiltering

How to apply multilevel groupby and filter top N records within each subgroup?


Input:
Input

I have an input DataFrame as shown above. The products in each category need to be grouped by the sum of their sales. From this we select the top N products(in this example,N=2) in each category and only those products from the respective categories should be displayed in the final output/dataFrame as shown below

Output:
Output

I could try creating loops and iterating. I was hoping there would be an elegant way to do this.

pd.DataFrame({'Date':['07-08-2023','06-08-2023','08-08-2023','08-08-2023','07-08-2023','06-08-2023','06-08-2023','08-08-2023'],
             'Category':['A','A','A','A','B','B','B','B'],
             'Products':['abc','xyz','abc','def','nop','hjk','qrs','qrs'],
             'Sales':[1,2,4,3,4,5,2,3]})

Solution

  • Using a groupby to get the top N products, then a merge:

    N = 2
    cols = ['Category', 'Products']
    
    out = df.merge(df
       .groupby(cols, as_index=False)['Sales'].sum()
       .sort_values(by='Sales', ascending=False)
       .groupby('Category', as_index=False)
       [cols].head(N)
     )
    

    Output:

             Date Category Products  Sales
    0  07-08-2023        A      abc      1
    1  08-08-2023        A      abc      4
    2  08-08-2023        A      def      3
    3  06-08-2023        B      hjk      5
    4  06-08-2023        B      qrs      2
    5  08-08-2023        B      qrs      3