Search code examples
pandasduplicatesrows

keep row with highest value amongst duplicates on different columns


I have a pandas dataframe like this one, where I can have rows with same combination of long and lat:

Initial df:

   lon  lat         name  value protection      a      b         c  score
0   20   10       canada    563        NaN    cat    dog  elephant   20.0
1   30   10       canada     65        NaN   lion  tiger       cat   30.0
2   40   20       canada    893        NaN    dog    NaN       NaN   20.0
3   40   20          usa      4        NaN  horse  horse      lion   40.0
4   45   15          usa   8593        NaN    NaN   lion       cat   10.0
5   20   10  protection1    100     medium    NaN    NaN       NaN    NaN
6   40   20  protection1     20       high    NaN    NaN       NaN    NaN
7   50   30  protection1    500        low    NaN    NaN       NaN    NaN

but what I want is:

wanted output:

   lon  lat protection      a      b         c  score
0   20   10     medium    cat    dog  elephant   20.0
1   30   10        NaN   lion  tiger       cat   30.0
2   40   20       high  horse  horse      lion   40.0
3   45   15        NaN    NaN   lion       cat   10.0
4   50   30        low    NaN    NaN       NaN    NaN

The output dataframe should contain rows with a unique combination of long and lat columns where only the row with the highest score is retained but if long and lat have duplicates and a value in the protection column these should be join into one


Solution

  • Try:

    df = df.sort_values(by="score", ascending=False)
    g = df.groupby(["lon", "lat"])
    df_out = (
        g.first()
        .assign(
            protection=g.agg(
                {"protection": lambda x: ",".join(x.dropna())}
            ).replace("", np.nan)
        )
        .reset_index()
    )
    
    print(df_out)
    

    Prints:

       lon  lat         name  value protection      a      b         c  score
    0   20   10       canada    563     medium    cat    dog  elephant   20.0
    1   30   10       canada     65        NaN   lion  tiger       cat   30.0
    2   40   20          usa      4       high  horse  horse      lion   40.0
    3   45   15          usa   8593        NaN    NaN   lion       cat   10.0
    4   50   30  protection1    500        low    NaN    NaN       NaN    NaN