I am wondering how the function normalizeFeatures
works along with a resampling strategy. Which of these statements is true?
mlrCPO::retrafo
does in some way).Thank you for your help!
The function normalizeFeatures()
can be called on a data.frame
and a Task
object.
In both cases it does the same. It simply normalizes the whole task. So statement 1) is true.
If you want to achieve the second you have two options:
preprocWrapperCaret
The wrapper will put the scaling infront of the training and the prediction. For the training the scaling parameters will be saved and applied. For the prediction the saved scaling parameters will be applied.
library(mlr)
lrn = makeLearner("classif.svm")
lrn = makePreprocWrapperCaret(lrn, ppc.center = TRUE, ppc.scale = TRUE)
set.seed(1)
res = resample(lrn, iris.task, resampling = hout, models = TRUE)
# the scaling parameters learnt on the training spit
res$models[[1]]$learner.model$control$mean
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.831 3.030 3.782 1.222
res$models[[1]]$learner.model$control$std
Sepal.Length Sepal.Width Petal.Length Petal.Width
0.8611356 0.4118203 1.7487877 0.7710127
mlrCPO
A bit more elegant and flexible approach is to built a preprocessing pipeline with the mlrCPO
package which has the same effect as a wrapper in this case.
library(mlr)
library(mlrCPO)
lrn = cpoScale(center = TRUE, scale = TRUE) %>>% makeLearner("classif.svm")
set.seed(1)
res = resample(lrn, iris.task, resampling = hout, models = TRUE)
# the scaling parameters learnt on the training spit
res$models[[1]]$learner.model$retrafo$element$state
$center
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.831 3.030 3.782 1.222
$scale
Sepal.Length Sepal.Width Petal.Length Petal.Width
0.8611356 0.4118203 1.7487877 0.7710127
I set the seed to obtain the same training split for both cases so that the learnt scaling parameters are the same for both approaches.