Search code examples
pythonpandasdataframemulti-index

In Pandas how to remove all subrows but keep one which has the highest value in a specific column in a multiIndex dataframe?


So I have a dataframe like this:

+---+-----+------------+------------+-------+
|   |     | something1 | something2 | score |
+---+-----+------------+------------+-------+
| 1 | 112 |       1.00 |       10.0 |    15 |
|   | 116 |       0.76 |      -2.00 |    14 |
| 8 | 112 |       0.76 |       0.05 |    55 |
|   | 116 |       1.00 |       1.02 |    54 |
+---+-----+------------+------------+-------+

And I want to achieve this:

+---+-----+------------+------------+-------+
|   |     | something1 | something2 | score |
+---+-----+------------+------------+-------+
| 1 | 112 |       1.00 |       10.0 |    15 |
| 8 | 112 |       1.00 |       1.02 |    55 |
+---+-----+------------+------------+-------+

I want to keep only one row for each first index which has the greatest score value.

I tried with something like this, sorting the df then selecting the first row in each group but it didn't work as expected:

df = df.sort_values("score", ascending=False).groupby(level=[0, 1]).first()

Thank you!


Solution

  • You only need to group by level 0:

    df.sort_values("score", ascending=False).groupby(level=0).first()
    #     something1 something2 score
    #1.0        1.00      10.00    15
    #8.0        0.76       0.05    55
    

    To keep the second level index, you can reset it to be a column and set it back as index later:

    (df.sort_values("score", ascending=False)
       .reset_index(level=1)
       .groupby(level=0).first()
       .set_index('level_1', append=True))
    
    #           something1  something2   score
    #   level_1         
    #1.0    112       1.00       10.00      15
    #8.0    112       0.76        0.05      55
    

    An alternative using nlargest:

    df.groupby(level=0, group_keys=False).apply(lambda g: g.nlargest(1, 'score'))
    #        something1 something2  score
    #1.0  112      1.00      10.00     15
    #8.0  112      0.76       0.05     55