How does the wrapper normalizeFeatures behave with a validation set?

I am wondering how the function normalizeFeatures works along with a resampling strategy. Which of these statements is true?

The whole task data is normalized
The training data is normalized, and the parameters of that normalization (let's say, mean and sd in a classsic standardization) are used to normalize the validation data (what mlrCPO::retrafo does in some way).

Thank you for your help!

Solution

The function normalizeFeatures() can be called on a data.frame and a Task object. In both cases it does the same. It simply normalizes the whole task. So statement 1) is true.

If you want to achieve the second you have two options:

a) `preprocWrapperCaret`

The wrapper will put the scaling infront of the training and the prediction. For the training the scaling parameters will be saved and applied. For the prediction the saved scaling parameters will be applied.

library(mlr)
lrn = makeLearner("classif.svm")
lrn = makePreprocWrapperCaret(lrn, ppc.center = TRUE, ppc.scale = TRUE)

set.seed(1)
res = resample(lrn, iris.task, resampling = hout, models = TRUE)

# the scaling parameters learnt on the training spit
res$models[[1]]$learner.model$control$mean

Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.831        3.030        3.782        1.222

res$models[[1]]$learner.model$control$std

Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
   0.8611356    0.4118203    1.7487877    0.7710127

b) `mlrCPO`

A bit more elegant and flexible approach is to built a preprocessing pipeline with the mlrCPO package which has the same effect as a wrapper in this case.

library(mlr)
library(mlrCPO)
lrn = cpoScale(center = TRUE, scale = TRUE) %>>% makeLearner("classif.svm")
set.seed(1)
res = resample(lrn, iris.task, resampling = hout, models = TRUE)
# the scaling parameters learnt on the training spit
res$models[[1]]$learner.model$retrafo$element$state

$center
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.831        3.030        3.782        1.222 

$scale
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
   0.8611356    0.4118203    1.7487877    0.7710127

I set the seed to obtain the same training split for both cases so that the learnt scaling parameters are the same for both approaches.

How does the wrapper normalizeFeatures behave with a validation set?

a) preprocWrapperCaret

b) mlrCPO

a) `preprocWrapperCaret`

b) `mlrCPO`