Search code examples
rlinear-regressioncross-validationr-caret

Error "[.data.frame`(data, , all.vars(Terms), drop = FALSE) undefined columns selected" in caret for repeated k-fold regression?


I'm trying to perform a repeated 4-fold cross validated regression on a dataset with 28 samples. I get the following error:

> data1
     X1  X2   X3  outcome
1     7   0  180      108
2   130   0   35      104
3     0   0    3       97
4    23   0    0       11
5   122   0  383       16
6   103   0  272       74
7   403   0    0       58
8   127   0    0       16
9    35   0  268       52
10  353  10  420       49
11  211   0  220       47
12   28   0   18       50
13  210   0  603       39
14  260   1  313       37
15    5   0  468       29
16   40   0    9       10
17  255   0  229       33
18  254   6  205       29
19    4  28  165       44
20  225   0  147       14
21  339   0    0       23
22  347   2  324       20
23  214   3  313       16
24   73   4  386       13
25  297   0  369      118
26  248   0  492       92
27   89   0    0       87
28    5   0    9       80

> set.seed(123)
> train.control <- trainControl(method = "repeatedcv", number = 4, repeats = 3)
> model <- train(data1$outcome ~., data = data1, method = "lm",trControl = train.control)
Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) : 
  undefined columns selected

I also tried removing the outcome (data=data1[,-4]) but I still get the same error. Can you help me with this?


Solution

  • Use a formula syntax in train function.

    library(caret)
    set.seed(123)
    train.control <- trainControl(method = "repeatedcv", number = 4, repeats = 3)
    model <- train(outcome ~., data = data1, method = "lm",trControl = train.control)
    model
    #Linear Regression 
    
    #28 samples
    # 3 predictor
    
    #No pre-processing
    #Resampling: Cross-Validated (4 fold, repeated 3 times) 
    #Summary of sample sizes: 20, 21, 22, 21, 22, 20, ... 
    #Resampling results:
    
    #  RMSE      Rsquared    MAE     
    #  38.78937  0.08910678  33.24453
    
    #Tuning parameter 'intercept' was held constant at a value of TRUE