Search code examples
pythonpandasdataframenumpydata-science

Selecting row with highest value based on two different columns


I have a dataframe with 3 columns: I want to make a rule that if for a same city and same id, pick the maximum value and drop the row with lower value.

eg:

City ID Value
London 1 12.45
Amsterdam 1 14.56
Paris 1 16.89
New York 1 23.86
Chicago 1 14.56
Chicago 1 20.76

Expected Output : Select highest value for same city and ID. Here Chicago has 2 entries with same ID, I want to select the row with highest value.

Expected Output

City ID Value
London 1 12.45
Amsterdam 1 14.56
Paris 1 16.89
New York 1 23.86
Chicago 1 20.76

Solution

  • Sort the dataframe by Value in descending order and drop the duplicates:

    df = df.sort_values(by=['Value'], ascending=False).drop_duplicates(['City', 'ID'], keep='first')
    print(df)
    

    Prints:

            City  ID  Value
    3   New York   1  23.86
    5    Chicago   1  20.76
    2      Paris   1  16.89
    1  Amsterdam   1  14.56
    0     London   1  12.45