I'm trying to perform a repeated 4-fold cross validated regression on a dataset with 28 samples. I get the following error:
> data1
X1 X2 X3 outcome
1 7 0 180 108
2 130 0 35 104
3 0 0 3 97
4 23 0 0 11
5 122 0 383 16
6 103 0 272 74
7 403 0 0 58
8 127 0 0 16
9 35 0 268 52
10 353 10 420 49
11 211 0 220 47
12 28 0 18 50
13 210 0 603 39
14 260 1 313 37
15 5 0 468 29
16 40 0 9 10
17 255 0 229 33
18 254 6 205 29
19 4 28 165 44
20 225 0 147 14
21 339 0 0 23
22 347 2 324 20
23 214 3 313 16
24 73 4 386 13
25 297 0 369 118
26 248 0 492 92
27 89 0 0 87
28 5 0 9 80
> set.seed(123)
> train.control <- trainControl(method = "repeatedcv", number = 4, repeats = 3)
> model <- train(data1$outcome ~., data = data1, method = "lm",trControl = train.control)
Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) :
undefined columns selected
I also tried removing the outcome (data=data1[,-4]) but I still get the same error. Can you help me with this?
Use a formula syntax in train
function.
library(caret)
set.seed(123)
train.control <- trainControl(method = "repeatedcv", number = 4, repeats = 3)
model <- train(outcome ~., data = data1, method = "lm",trControl = train.control)
model
#Linear Regression
#28 samples
# 3 predictor
#No pre-processing
#Resampling: Cross-Validated (4 fold, repeated 3 times)
#Summary of sample sizes: 20, 21, 22, 21, 22, 20, ...
#Resampling results:
# RMSE Rsquared MAE
# 38.78937 0.08910678 33.24453
#Tuning parameter 'intercept' was held constant at a value of TRUE