I have a training set with some dummy variables [0] and I do not want to preProc=c("center","scale")
them, but I want to preProc=c("center","scale")
all the not dummy variables in order to normalize them like here[1]. So as what the center and scale options make is the following:
Would it make sense to make an array with all the non dummy variables, calculate the mean and SD of each variable, center and scale all of the values and then concat this array with another array that contains all the dummy variables resulting in new_array
array and then train the model like this? or this would not work?
ctrl <- trainControl(method = "repeatedcv", number=10, repeats=3)
knn_model <- train (Class ~ ., data=new_array, method="knn", trControl=ctrl)
Note: I have asked this question already in CrossValidated but due to it is also related with StackOverflow I ask it again here.
[0] https://topepo.github.io/caret/pre-processing.html#dummy
You could do this to have everything within caret
Let say you have a data.frame called DF
with your columns from 1:5 that are numeric and 6:10 that are factorial. You could do the following:
PreProcovCenter <- preProcess(DF[,1:5])
preProcovDummy <- dummyVars(DF[,6:10])
DF[,1:5] <- predict(PreProcovCenter, DF[,1:5])
DFDummy <- predict(PreProcovDummy, DF[,6:10])
DF <- cbind(DF, DFDummy)
and finally:
knn_model <- train (Class ~ ., data=DF, method="knn", trControl=ctrl)