I have a dataframe as follows:
For each column objectid
, iam trying to keep the row for which the column pct
is max.
I have done a groupby as follows:
df.groupby(['objectid'])['pct'].max().reset_index()
However, the issue is that i lose the information of other columns which i need to do other stuff.
here is an example of the output. The output only keeps two columns objectid
and pct
. However, i need all columns.
objectid pct
0 10000Aga 100.0
1 10001Aga 100.0
How can i do a groupy and retrieve the information of all columns.
You can sort the values by pct
and then keep the last one for each objectid
.
If you want to use groupby
:
df = df.sort_values("pct").groupby("objectid").last().reset_index()
Or you can use dropduplicates
, which is faster:
df = df.sort_values("pct").drop_duplicates(["objectid"], keep="last")