Search code examples
pythonpandasdataframescalingnormalize

Normalize/scale dataframe in a certain range


I have the following Dataframe:

pd.DataFrame({'DateTime': {0: Timestamp('2022-02-08 00:00:00'),
  1: Timestamp('2022-02-08 00:10:00'),
  2: Timestamp('2022-02-08 00:20:00'),
  3: Timestamp('2022-02-08 00:30:00'),
  4: Timestamp('2022-02-08 00:40:00')},
 'wind power [W]': {0: 83.9, 1: 57.2, 2: 58.2, 3: 48.0, 4: 69.5}})
             DateTime  wind power [W]
0 2022-02-08 00:00:00            83.9
1 2022-02-08 00:10:00            57.2
2 2022-02-08 00:20:00            58.2
3 2022-02-08 00:30:00            48.0
4 2022-02-08 00:40:00            69.5

As you can see, 83.9 is the maximum value in my second column and 48.0 the minimum value. I want to normalize these values in a range between 0.6 and 8.4, so that 83.9 would turn to 8.4 and 48.0 to 0.6. The rest of the numbers would fall somewhere in between. So far I only managed to normalize the column to a range of 0-1 with the code:

df['normalized'] = (df['wind power [W]']-df['wind power [W]'].min())/(df['wind power [W]'].max()-df['wind power [W]'].min())

I don't know how to further proceed to get these numbers in my desired range. Can someone help me, please?


Solution

  • We can use MinMaxScaler to perform feature scaling, MinMaxScaler supports a parameter called feature_range which allows us to specify the desired range of the transformed data

    from sklearn.preprocessing import MinMaxScaler
    
    scaler = MinMaxScaler(feature_range=(0.6, 8.4))
    df['normalized'] = scaler.fit_transform(df['wind power [W]'].values[:, None])
    

    Alternatively if you don't want to use MinMaxScaler, here is a way scale data in pandas only:

    w = df['wind power [W]'].agg(['min', 'max'])
    norm = (df['wind power [W]'] - w['min']) / (w['max'] - w['min'])
    df['normalized'] = norm * (8.4 - 0.6) + 0.6
    

    print(df)
    
                 DateTime  wind power [W]  normalized
    0 2022-02-08 00:00:00            83.9    8.400000
    1 2022-02-08 00:10:00            57.2    2.598886
    2 2022-02-08 00:20:00            58.2    2.816156
    3 2022-02-08 00:30:00            48.0    0.600000
    4 2022-02-08 00:40:00            69.5    5.271309