i am currently in the process of pre-processing my data and I understand that i have to use the same scaling parameters I have used on my training set, on my test set. However, when i applied the transform
method from sklearn
library, i noticed something weird.
I first used preprocessing.MinMaxScaler(feature_range=(0,1))
on my training set which sets the maximum to be 1 and minimum to be 0. Next, i used minmax_scaler.transform(data)
on my test set and I've noticed when i printed out the data-frame, I have values that are greater than 1. What can this possibly mean?
For a given feature x
, your minmax
scaling to (0,1)
will effectively map:
x to (x- min_train_x)/(max_train_x - min_train_x)
where min_train_x
and max_train_x
are the minimum and maximum value of x
in the training set.
If a value of x
in the testing set is larger than the max_train_x
the scaling transformation will return a value > 1
.
It usually is not a big problem except if the input has to be in the (0,1)
range.