Search code examples
pythonfeature-scaling

Feature Scaling


I read it from a post that someone said:

For feature scaling, you learn the means and standard deviation of the training set, and then:

  • Standardize the training set using the training set means and standard deviations.
  • Standardize any test set using the training set means and standard deviations.

But now my question is, after fitting a model using scaled training data, should I then apply this fitted model onto scaled or unscaled test data? Thanks!


Solution

  • Yes, you should also scale the test data. If you have scaled your training data and fitted a model to that scaled data, then the test set should also undergo equivalent preprocessing as well. This is standard practice, as it ensures that the model is always provided a data set of consistent form as input.

    In Python, the process might look as follows:

    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)
    

    There is a detailed write up on this topic on another thread that might be of interest to you.