I have a dataframe that consist of numerical and non-numerical variables. I am trying to fit a logisic regression model predicting my variable "risk" based on all other variables, optimizing AUC using a 6-fold cross validation.
However, I want to center and scale all numerical explanatory variables. My code raises no errors or warning but somehow I fail to figure out how to tell train()
through preProcess
(or in some other way) to just center and scale my numerical variables.
Here is the code:
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")
You could try to preprocess all numerical variables in original df first and then applying train
function over scaled df
library(dplyr)
library(caret)
df <- df %>%
dplyr::mutate_if(is.numeric, scale)
test <- train(risk ~ .,
method = "glm",
data = df,
family = binomial(link = "logit"),
trControl = trainControl(method = "cv",
number = 6,
classProbs = TRUE,
summaryFunction = prSummary),
metric = "AUC")