I have a dataframe df
ID | Price | Region |
---|---|---|
1 | 23 | AUS |
1 | 45 | DXB |
2 | 25 | GER |
2 | 18 | TUN |
I want to write a code in python to get the following output
ID | Price | Region |
---|---|---|
1 | 45 | DXB |
2 | 25 | TUN |
I have tried using pandasql to get the output but it is not giving the output I want
The code I have tried is
import pandas as pd
import pandasql as ps
#to read table
df=pd.read_excel("test.xlsx")
ps.sqldf("select ID, max(Price), Region from df order by ID")
If there is any other code in python itself (without using pandasql) that can get the above output, please let me know
You can use groupby.transform
output_df = df[df['Price'].eq(df.groupby("ID")['Price'].transform("max"))]
Or with ps.sqldf
using window function to get the max price and then return rows where Price equals max price:
output_df = ps.sqldf("""select ID,Price,Region from
(select *, max(Price) over (partition by ID) max_Price from df)
where Price = max_Price""")
ID Price Region
0 1 45 DXB
1 2 25 GER