Search code examples
pythonpython-3.xpandasmulti-index

Pandas MultiIndex dataframe replace min and max with 0


I have a dataframe with multiindex and trying to find a way to replace values that are equal to min or max with 0 within each level = 0. For example:

import pandas as pd
import numpy as np

d = {'index0': ['p1', 'p1', 'p1', 'p2', 'p2', 'p2', 'p2', 'p2', 'p3', 'p3', 'p3', 'p3', 'p3'],
'index1': ['u1', 'u2', 'u3', 'u1', 'u2', 'u3', 'u4', 'u5', 'u1', 'u2', 'u3', 'u4', 'u5'],
'data': [2, 16, 49, -67, -57, -30, 14, 76, 52, -26, 38, 79, 65]}

df = pd.DataFrame(data=d)
df = df.set_index(['index0', 'index1'], drop = False)

print(df)

I want from this dataframe:

              index0 index1  data
index0 index1
p1     u1         p1     u1     2
       u2         p1     u2    16
       u3         p1     u3    49
p2     u1         p2     u1   -67
       u2         p2     u2   -57
       u3         p2     u3   -30
       u4         p2     u4    14
       u5         p2     u5    76
p3     u1         p3     u1    52
       u2         p3     u2   -26
       u3         p3     u3    38
       u4         p3     u4    79
       u5         p3     u5    65

Get this:

              index0 index1  data
index0 index1
p1     u1         p1     u1     0
       u2         p1     u2    16
       u3         p1     u3     0
p2     u1         p2     u1     0
       u2         p2     u2   -57
       u3         p2     u3   -30
       u4         p2     u4    14
       u5         p2     u5     0
p3     u1         p3     u1    52
       u2         p3     u2     0
       u3         p3     u3    38
       u4         p3     u4     0
       u5         p3     u5    65

I've tried using similar examples but couldn't get it to work. What is the best way to do it?


Solution

  • Both of these solutions do not use your set_index line, so make sure you avoid that.

    Using groupby, agg and join

    s = df.groupby('index0').data.agg(['min', 'max']).add_prefix('data_')
    out = df.set_index('index0').join(s)
    out.loc[out.data.eq(out.data_min) | out.data.eq(out.data_max), 'data'] = 0
    out = out.drop(['data_min', 'data_max'], 1)
    

    Using np.where:

    s = df.groupby('index0').data.agg(['min', 'max']).add_prefix('data_')
    out = df.set_index('index0').join(s)
    df.assign(data=np.where(out.data.eq(out.data_min) | out.data.eq(out.data_max), 0, out.data))
    

    Both of these produce:

       index0 index1  data
    0      p1     u1     0
    1      p1     u2    16
    2      p1     u3     0
    3      p2     u1     0
    4      p2     u2   -57
    5      p2     u3   -30
    6      p2     u4    14
    7      p2     u5     0
    8      p3     u1    52
    9      p3     u2     0
    10     p3     u3    38
    11     p3     u4     0
    12     p3     u5    65