Search code examples
pythonrmachine-learningscikit-learnnormalization

Trainable sklearn StandardScaler for R


Is there something similar in R that allows to fit a StandardScaler (resulting into mean=0 and standard deviation=1 features) to the training data and use that scaler model to transform the test data? scale does not offer a way to transform test-data based on the mean and standard deviation from the training data.

Snippet for Python:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

Since I'm pretty sure that this is the right way to do so (avoiding the leak of information from the test to the training set) I guess there is a simple solution I'm just unable to find.


Solution

  • I believe that the scale function in R does what you are looking for. For your example, that would just be

    X_train_scaled = scale(X_train)
    

    Then, you can apply the mean and sd from the scaled training set to your test set using the attr (attributes) from your scaled X_train:

    X_test_scaled = scale(X_test, center=attr(X_train_scaled, "scaled:center"), 
                                  scale=attr(X_train_scaled, "scaled:scale"))
    

    This obtains the exact results as the transformations from the example that you posted