Search code examples
pythonpandasfilterpandas-groupby

Pandas DF - Filter DF by Value in Column A, Highest Value of in Column B (Group By?)


I'm trying to do something I feel shouldn't be too difficult but I'm having trouble with it.

The best way would be to illustrate it. I currently have a DF with many rows/columns. I want to grab the highest value of Column B, in the unique value of Column A, and drop the rest of the rows that don't matter. I am having a hard to articulating what is is, maybe grouping etc

For example:

Initial Table

Col A  - Col B - Col C
2012        1       2
2012        1       7
2012        2       45
2012        2       34
2012        3       4
2012        3       32
2013        1       54
2013        1       3
2013        2       5
2013        2       23

Table I want to end up with

Col A  - Col B - Col C
2012        3       4
2012        3       32
2013        2       5
2013        2       23

Thanks for any help everyone!!

Will keep searching around but it's a tricky one


Solution

  • Try:

    df_out = df.groupby("Col A").apply(lambda x: x[x["Col B"] == x["Col B"].max()])
    print(df_out.reset_index(drop=True))
    

    Prints:

       Col A  Col B  Col C
    0   2012      3      4
    1   2012      3     32
    2   2013      2      5
    3   2013      2     23
    

    Or: using .transform:

    df_out = df[df["Col B"] == df.groupby("Col A")["Col B"].transform("max")]
    print(df_out)