Search code examples
pythondataframetensorflowscikit-learnsklearn-pandas

Should the same Min and Max be applied for Training and Prediction on a DataFrame?


I am applying sklearn.preprocessing.MinMaxScaler() to a DataFrame and using the DataFrame for machine learning. After training I have a separate code and DataFrame to do a prediction. In the prediction code I do a MinMaxScaler() on the DataFrame I want to use to predict. The Training DataFrame and Prediction DataFrame will have different Min and Max values. My question is should the Training DataFrame and Prediction DataFrame use the same Min and Max values in order to get an accurate prediction?


Solution

  • Yes, you should use the same MinMaxScaler() on the train and test.

    Explanation: Assume your training dataset has some features with min=10 and max=20 and your test dataset has features with min=1 and max=10. If a separate scaler is trained on test, the test data values will be lower in comparison to the training dataset.