Search code examples
pythonpandasscikit-learnsklearn-pandas

Group by MinMaxScaler in pandas dataframe


I would like to apply minmax scaler to column X2 and X3 in dataframe df and add columns X2_Scale and X3_Scale for each month.

df = pd.DataFrame({
    'Month': [1,1,1,1,1,1,2,2,2,2,2,2,2],
    'X1': [12,10,100,55,65,60,35,25,10,15,30,40,50],
    'X2': [10,15,24,32,8,6,10,23,24,56,45,10,56],
    'X3': [12,90,20,40,10,15,30,40,60,42,2,4,10]
})

Below code is what I tried but got en error.

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

cols = df.columns[2:4]
df[cols + 'scale'] = df.groupby('Month')[cols].scaler.fit_transform(df[cols])

How can I do this? Thank you.


Solution

  • Approach 1

    Group and transform the columns X2 and X3 using a scaling function which applies the min-max scaling transformation and returns the scaled values

    def scale(X):
        X_ = np.atleast_2d(X)
        return pd.DataFrame(scaler.fit_transform(X_), X.index)
    
    df[cols + '_scale'] = df.groupby('Month')[cols].apply(scale)
    

    Approach 2

    Lets stick to the basics and calculate the min, max values from corresponding columns X2 and X3 for each group, then apply the scaling formula to the columns using the calculated min-max values

    g = df.groupby('Month')[cols]
    min_, max_ = g.transform('min'), g.transform('max')
    df[cols + '_scale'] = (df[cols] - min_) / (max_ - min_)
    

    Result

        Month   X1  X2  X3  X2_scale  X3_scale
    0       1   12  10  12  0.153846  0.025000
    1       1   10  15  90  0.346154  1.000000
    2       1  100  24  20  0.692308  0.125000
    3       1   55  32  40  1.000000  0.375000
    4       1   65   8  10  0.076923  0.000000
    5       1   60   6  15  0.000000  0.062500
    6       2   35  10  30  0.000000  0.482759
    7       2   25  23  40  0.282609  0.655172
    8       2   10  24  60  0.304348  1.000000
    9       2   15  56  42  1.000000  0.689655
    10      2   30  45   2  0.760870  0.000000
    11      2   40  10   4  0.000000  0.034483
    12      2   50  56  10  1.000000  0.137931