Search code examples
pythonmathscikit-learnscaling

Different Result from MinMaxScaler() with Manual Calculations


So i was tinkering with MinMaxScaler and want to see how it works with manual calculations here's the array i'm trying to scale

array = [[0, -1.73, -1.73, -2.0, -2.0, -2.0, -1.73], 
        [-1.73, 0, -1.41, -1.73, -1.73, -1.73, -1.41], 
        [-1.73, -1.41, 0, -1.73, -1.73, -1.73, -0.0], 
        [-2.0, -1.73, -1.73, 0, -1.41, -1.41, -1.73], 
        [-2.0, -1.73, -1.73, -1.41, 0, -0.0, -1.73], 
        [-2.0, -1.73, -1.73, -1.41, -0.0, 0, -1.73], 
        [-1.73, -1.41, -0.0, -1.73, -1.73, -1.73, 0]]

the lowest value is -2.0 and highest value is 0. When i do my manual calculations it is based on MinMaxScaler() formula stated in sklearn minmax but when i program it, it shows different result as in this code

from sklearn.preprocessing import MinMaxScaler
import numpy as np

X = np.array([
[0, -1.73, -1.73, -2.0, -2.0, -2.0, -1.73], 
[-1.73, 0, -1.41, -1.73, -1.73, -1.73, -1.41], 
[-1.73, -1.41, 0, -1.73, -1.73, -1.73, -0.0], 
[-2.0, -1.73, -1.73, 0, -1.41, -1.41, -1.73], 
[-2.0, -1.73, -1.73, -1.41, 0, -0.0, -1.73], 
[-2.0, -1.73, -1.73, -1.41, -0.0, 0, -1.73], 
[-1.73, -1.41, -0.0, -1.73, -1.73, -1.73, 0]
])

# create an instance of MinMaxScaler
scaler = MinMaxScaler()

# fit the scaler to the data and transform the data
X_scaled = scaler.fit_transform(X)

# print the scaled data
print(X_scaled)

The Result Array is

[[1.        0.        0.        0.        0.        
0.        0.       ]
 [0.135     1.        0.1849711 0.135     0.135     
0.135     0.1849711]
 [0.135     0.1849711 1.        0.135     0.135     
0.135     1.       ]
 [0.        0.        0.        1.        0.295     
0.295     0.       ]
 [0.        0.        0.        0.295     1.        
1.        0.       ]
 [0.        0.        0.        0.295     1.        
1.        0.       ]
 [0.135     0.1849711 1.        0.135     0.135     
0.135     1.       ]]

My Calculations

x' = (x-min)/ (max⁡ - min)
x' = (-1.73-(-2)) / (0⁡ -(-2)) 
x' = 0.135

My question is where did i do my calculations differently than sklearn ? why is -1.73 becomes 0 ?


Solution

  • It is scaling each column of the array. -1.73 becomes 0 only if it is the smallest value in that column. Notice how -1.73 does not become 0 in the first column.

    This is intentional and it is due to the axis=0 argument.

    If you want to scale each element of the array according to the min and max of the entire array you could do something like this:

    from sklearn.preprocessing import MinMaxScaler
    import numpy as np
    
    X = np.array(
        [
            [0, -1.73, -1.73, -2.0, -2.0, -2.0, -1.73],
            [-1.73, 0, -1.41, -1.73, -1.73, -1.73, -1.41],
            [-1.73, -1.41, 0, -1.73, -1.73, -1.73, -0.0],
            [-2.0, -1.73, -1.73, 0, -1.41, -1.41, -1.73],
            [-2.0, -1.73, -1.73, -1.41, 0, -0.0, -1.73],
            [-2.0, -1.73, -1.73, -1.41, -0.0, 0, -1.73],
            [-1.73, -1.41, -0.0, -1.73, -1.73, -1.73, 0],
        ]
    )
    
    # create an instance of MinMaxScaler
    scaler = MinMaxScaler()
    
    # fit the scaler to the data and transform the data
    X_scaled = scaler.fit_transform(X.reshape(-1, 1)).reshape(*X.shape)
    
    # print the scaled data
    print(X_scaled)