Search code examples
python-3.xpandasdataframegroup-by

Groupby: get the max value and keep all columns related


I have a dataframe as follows:

enter image description here

For each column objectid, iam trying to keep the row for which the column pct is max.

I have done a groupby as follows:

df.groupby(['objectid'])['pct'].max().reset_index()

However, the issue is that i lose the information of other columns which i need to do other stuff.

here is an example of the output. The output only keeps two columns objectid and pct. However, i need all columns.

    objectid    pct
0   10000Aga    100.0
1   10001Aga    100.0

How can i do a groupy and retrieve the information of all columns.


Solution

  • You can sort the values by pct and then keep the last one for each objectid.

    If you want to use groupby:

    df = df.sort_values("pct").groupby("objectid").last().reset_index()
    

    Or you can use dropduplicates, which is faster:

    df = df.sort_values("pct").drop_duplicates(["objectid"], keep="last")