Search code examples
rr-caretsubstitution

R caret substitute column names


I want so substitute a column name within the train function of the caret package. Therefore I substituted the column name of the target variable target with eval(parse(text = paste0(targetname))). Using the randomForest function itself, my code worked. But using train produces an error:

library(caret)
library(randomForest)
dat <- data.frame(target = c(2.5, 4.5, 6.1, 3.2, 2.2),
              A = c(1.3, 4.4, 5.5, 6.7, 8.1),
              B = c(44.5, 50.1, 23.7, 89.2, 10.5),
              C = c("A", "A", "B", "B", "B"))

targetname <- "target"

set.seed(42)
model <- train(eval(parse(text = paste0(targetname))) ~ A + B + C, 
                   data = dat, 
                   method="rf",
                   ntree = 250, 
                   metric= "RMSE")

This code produces Error in [.data.frame(data, , all.vars(Terms), drop = FALSE) : undefined columns selected

What expression that accepts the substitute targetname could I write instead of eval(parse(text = paste0(targetname)))?


Solution

  • You could use

    formula(paste(targetname, " ~ A + B + C"))
    # target ~ A + B + C
    

    as in

    model <- train(formula(paste(targetname, " ~ A + B + C")), 
                   data = dat, 
                   method="rf",
                   ntree = 250, 
                   metric= "RMSE")