Search code examples
pythonscikit-learnsklearn-pandas

What happens when you transform the test set using MinMaxScaler


i am currently in the process of pre-processing my data and I understand that i have to use the same scaling parameters I have used on my training set, on my test set. However, when i applied the transform method from sklearn library, i noticed something weird.

I first used preprocessing.MinMaxScaler(feature_range=(0,1)) on my training set which sets the maximum to be 1 and minimum to be 0. Next, i used minmax_scaler.transform(data) on my test set and I've noticed when i printed out the data-frame, I have values that are greater than 1. What can this possibly mean?


Solution

  • For a given feature x, your minmax scaling to (0,1) will effectively map:

    x to (x- min_train_x)/(max_train_x - min_train_x)

    where min_train_x and max_train_x are the minimum and maximum value of x in the training set.

    If a value of x in the testing set is larger than the max_train_x the scaling transformation will return a value > 1.

    It usually is not a big problem except if the input has to be in the (0,1) range.