Search code examples
python-3.xdataframescikit-learnnormalizationsklearn-pandas

How to use MaxAbsScaler to standardize values ​between 1 and 100


Given that I have data in a data frame as follows:

import pandas as pd

value_1 = [1, 2, 3, 4, 5]
value_2 = [1000, 20000, 50000, 33000, 21000]
value_3 = [0, 1, 0, 1, 1]
value_4 = [4, 8, 12, 10, 19]
target  = [1, 22, 100, 77, 100]

name_of_columns = ['obs1', 'obs2', 'obs3', 'obs4', 'target']

data_final = pd.DataFrame(columns = name_of_columns)

data_final.obs1   = value_1
data_final.obs2   = value_2
data_final.obs3   = value_3
data_final.obs4   = value_4
data_final.target = target

enter image description here

The target column ranges from 1 to 100. Thus, I would like to normalize the other columns to vary from 1 to 100.

How to do this using sklearn.preprocessing? I have identified the MaxAbsScaler module but I did not understand how to enter parameters so that the values ​​are between 1 and 100.


Solution

  • You probably want to be using MinMaxScaler instead. With this scaler, you can specify the range of each column ([1,100] in your case). So this is how it would be done:

    data = data_final[['obs1', 'obs2', 'obs3', 'obs4']]
    
    from sklearn.preprocessing import MinMaxScaler
    minmax = MinMaxScaler(feature_range = (1,100))
    minmax.fit(data)
    minmax.transform(data)
    

    This will return the following:

    array([[  1.        ,   1.        ,   1.        ,   1.        ],
           [ 25.75      ,  39.3877551 , 100.        ,  27.4       ],
           [ 50.5       , 100.        ,   1.        ,  53.8       ],
           [ 75.25      ,  65.65306122, 100.        ,  40.6       ],
           [100.        ,  41.40816327, 100.        , 100.        ]])
    

    As you can see, all your columns are now ranging from 1 to 100, as desired.