Search code examples
pythonpandasdataframemulti-index

Multi index dataframe delete row with maximum value per group


I have a multi index dataframe like this:

PID    Fid    x      y

  A      1    2      3
         2    6      1
         3    4      6
  B      1    3      5
         2    2      4
         3    5      7

I would like to delete the rows with the highest x-value per patient (PID). I need to get a new dataframe with the remaining rows and all columns to continue my analysis on these data, for example the mean of the remaining y-values. The dataframe should look like this:

PID    Fid    x      y

  A      1    2      3
         3    4      6
  B      1    3      5
         2    2      4

I used the code from Python Multiindex Dataframe remove maximum

idx = (df.reset_index('Fid')
                   .groupby('PID')['x']
                   .max()
                   .reset_index()
                   .values.tolist())
df_s = df.loc[df.index.difference(idx)]

I can get idx, but not remove them from the dataframe. It says TypeError: unhashable type: 'list'

What do I do wrong?


Solution

  • You can try this:

    idx = df.groupby(level=0)['x'].idxmax()
    df[~df.index.isin(idx)]
    
             x  y
    PID Fid      
    A   1    2  3
        3    4  6
    B   1    3  5
        2    2  4
    

    Or

    You can use pd.Index.difference here.

    df.loc[df.index.difference(df['x'].groupby(level=0).idxmax())] #Use level=0 if index is unnamed
                                             #('PID').idxmax())] 
             x  y
    PID Fid      
    A   1    2  3
        3    4  6
    B   1    3  5
        2    2  4