Search code examples
machine-learningscalenormalization

How does MinMaxScaler affect my test data?


Say I want to predict stock prices and I have my training data, where I know the minimum and maximum value. This seems like a good case to use MinMaxScaler, but what I'm wondering is the following. If I know from my training data that the highest value is set to 1, what happens when a stock price in my test data reaches a higher value than what I normalized to 1 in the first place? Does it just overwrite it and assign it as the new maximum?


Solution

  • Scalers in sklearn have three notable methods which you should use when running these types of programs:

    • scaler.fit(x) - this will set your scaler's min and max values (when using MinMaxScaler) to to those found in x
    • y_transformed = scaler.transform(y) - this will transform the data y using the parameters found in the fit command above
    • x_transformed = scaler.fit_transform(x) this will run both of the above commands simultaneously. This should only be applied to your training data.

    So, in essence, when you are training your model, you will be training on data which will strictly be in the range of 0-1 because your scaler will be fit according to that data. When you get new data or have data in your test/validation sets which might be outside of the range (using scaler.transform(y)), this will just return data outside of the 0-1 range (ie. values of 1.1 or -0.4).

    If this is an issue for the algorithm you're using, I would recommend either clipping the data to 0-1 anyway, or increasing the parameters of the scaler BEFORE TRAINING (in anticipation).