Groupby: get the max value and keep all columns related

I have a dataframe as follows:

For each column objectid, iam trying to keep the row for which the column pct is max.

I have done a groupby as follows:

df.groupby(['objectid'])['pct'].max().reset_index()

However, the issue is that i lose the information of other columns which i need to do other stuff.

here is an example of the output. The output only keeps two columns objectid and pct. However, i need all columns.

    objectid    pct
0   10000Aga    100.0
1   10001Aga    100.0

How can i do a groupy and retrieve the information of all columns.

Solution

You can sort the values by pct and then keep the last one for each objectid.

If you want to use groupby:

df = df.sort_values("pct").groupby("objectid").last().reset_index()

Or you can use dropduplicates, which is faster:

df = df.sort_values("pct").drop_duplicates(["objectid"], keep="last")