Search code examples
pythonpandas

Grouping on Multiple Columns then finding unique rows based on max


I have a data frame with 3 columns, 'Supplier', 'Category', and 'Amount'. This df is the result of a grouping operation where I grouped by Supplier, Category, and summed the Amounts. There are still many instance where there are multiple rows per supplier where they used different categories and had different amounts per category. What I'm hoping to do is filter the df so that only the category with the maximum amount per supplier is remaining. I've tried implementing this as part of the groupby, but hasn't worked, I've also followed tutorials where it shows using the .max() operation, but that results in a single line that shows the highest amount in the entire df. Any assistance would be appreciated - I can't share the actual data, but here is an example of what the df looks like now:

enter image description here

And what I would like the resulting df to look like:

enter image description here


Solution

  • UPDATE: this solution is not correct please check user19077881 answer.


    You can slice the dataframe by using the informatioa from GroupBy:

    g = df.groupby("SN")["Amount"].max()
    df = df.loc[df["SN"].isin(g.index) & df["Amount"].isin(g.values)].reset_index(drop=True)
    display(df)
    
        SN  Category    Amount
    0   1   Cat2        3000
    1   2   Cat22       5000