So i was tinkering with MinMaxScaler and want to see how it works with manual calculations here's the array i'm trying to scale
array = [[0, -1.73, -1.73, -2.0, -2.0, -2.0, -1.73],
[-1.73, 0, -1.41, -1.73, -1.73, -1.73, -1.41],
[-1.73, -1.41, 0, -1.73, -1.73, -1.73, -0.0],
[-2.0, -1.73, -1.73, 0, -1.41, -1.41, -1.73],
[-2.0, -1.73, -1.73, -1.41, 0, -0.0, -1.73],
[-2.0, -1.73, -1.73, -1.41, -0.0, 0, -1.73],
[-1.73, -1.41, -0.0, -1.73, -1.73, -1.73, 0]]
the lowest value is -2.0
and highest value is 0
. When i do my manual calculations it is based on MinMaxScaler()
formula stated in sklearn minmax but when i program it, it shows different result as in this code
from sklearn.preprocessing import MinMaxScaler
import numpy as np
X = np.array([
[0, -1.73, -1.73, -2.0, -2.0, -2.0, -1.73],
[-1.73, 0, -1.41, -1.73, -1.73, -1.73, -1.41],
[-1.73, -1.41, 0, -1.73, -1.73, -1.73, -0.0],
[-2.0, -1.73, -1.73, 0, -1.41, -1.41, -1.73],
[-2.0, -1.73, -1.73, -1.41, 0, -0.0, -1.73],
[-2.0, -1.73, -1.73, -1.41, -0.0, 0, -1.73],
[-1.73, -1.41, -0.0, -1.73, -1.73, -1.73, 0]
])
# create an instance of MinMaxScaler
scaler = MinMaxScaler()
# fit the scaler to the data and transform the data
X_scaled = scaler.fit_transform(X)
# print the scaled data
print(X_scaled)
The Result Array is
[[1. 0. 0. 0. 0.
0. 0. ]
[0.135 1. 0.1849711 0.135 0.135
0.135 0.1849711]
[0.135 0.1849711 1. 0.135 0.135
0.135 1. ]
[0. 0. 0. 1. 0.295
0.295 0. ]
[0. 0. 0. 0.295 1.
1. 0. ]
[0. 0. 0. 0.295 1.
1. 0. ]
[0.135 0.1849711 1. 0.135 0.135
0.135 1. ]]
My Calculations
x' = (x-min)/ (max - min)
x' = (-1.73-(-2)) / (0 -(-2))
x' = 0.135
My question is where did i do my calculations differently than sklearn ? why is -1.73 becomes 0 ?
It is scaling each column of the array. -1.73 becomes 0 only if it is the smallest value in that column. Notice how -1.73 does not become 0 in the first column.
This is intentional and it is due to the axis=0
argument.
If you want to scale each element of the array according to the min and max of the entire array you could do something like this:
from sklearn.preprocessing import MinMaxScaler
import numpy as np
X = np.array(
[
[0, -1.73, -1.73, -2.0, -2.0, -2.0, -1.73],
[-1.73, 0, -1.41, -1.73, -1.73, -1.73, -1.41],
[-1.73, -1.41, 0, -1.73, -1.73, -1.73, -0.0],
[-2.0, -1.73, -1.73, 0, -1.41, -1.41, -1.73],
[-2.0, -1.73, -1.73, -1.41, 0, -0.0, -1.73],
[-2.0, -1.73, -1.73, -1.41, -0.0, 0, -1.73],
[-1.73, -1.41, -0.0, -1.73, -1.73, -1.73, 0],
]
)
# create an instance of MinMaxScaler
scaler = MinMaxScaler()
# fit the scaler to the data and transform the data
X_scaled = scaler.fit_transform(X.reshape(-1, 1)).reshape(*X.shape)
# print the scaled data
print(X_scaled)