I am getting an odd error
Error in `[.data.frame`(data, , lvls[1]) : undefined columns selected
message when I am using caret to train a glmnet model. I have used basically the same code and the same predictors for an ordinal model (just with a different factor y
then) and it worked fine. It took 400 core hours to compute so I cant show it here though).
#Source a small subset of data
source("https://gist.githubusercontent.com/FredrikKarlssonSpeech/ebd9fccf1de6789a3f529cafc496a90c/raw/efc130e41c7d01d972d1c69e59bf8f5f5fea58fa/voice.R")
trainIndex <- createDataPartition(notna$RC, p = .75,
list = FALSE,
times = 1)
training <- notna[ trainIndex[,1],] %>%
select(RC,FCoM_envel:ATrPS_freq,`Jitter->F0_abs_dif`:RPDE)
testing <- notna[-trainIndex[,1],] %>%
select(RC,FCoM_envel:ATrPS_freq,`Jitter->F0_abs_dif`:RPDE)
fitControl <- trainControl(## 10-fold CV
method = "CV",
number = 10,
allowParallel=TRUE,
savePredictions="final",
summaryFunction=twoClassSummary)
vtCVFit <- train(x=training[-1],y=training[,"RC"],
method = "glmnet",
trControl = fitControl,
preProcess=c("center", "scale"),
metric="Kappa"
)
I cant find anything obviously wrong with the data. No NAs
table(is.na(training))
FALSE
43166
and dont see why it would try to index outside of the number of columns.
Any suggestions?
You have to remove summaryFunction=twoClassSummary in your trainControl(). It works for me.
fitControl <- trainControl(## 10-fold CV
method = "CV",
number = 10,
allowParallel=TRUE,
savePredictions="final")
vtCVFit <- train(x=training[-1],y=training[,"RC"],
method = "glmnet",
trControl = fitControl,
preProcess=c("center", "scale"),
metric="Kappa")
print(vtCVFit)
#glmnet
#113 samples
#381 predictors
# 2 classes: 'NVT', 'VT'
#Pre-processing: centered (381), scaled (381)
#Resampling: Bootstrapped (25 reps)
#Summary of sample sizes: 113, 113, 113, 113, 113, 113, ...
#Resampling results across tuning parameters:
# alpha lambda Accuracy Kappa
# 0.10 0.01113752 0.5778182 0.1428393
# 0.10 0.03521993 0.5778182 0.1428393
# 0.10 0.11137520 0.5778182 0.1428393
# 0.55 0.01113752 0.5778182 0.1428393
# 0.55 0.03521993 0.5748248 0.1407333
# 0.55 0.11137520 0.5749980 0.1136131
# 1.00 0.01113752 0.5815391 0.1531280
# 1.00 0.03521993 0.5800217 0.1361240
# 1.00 0.11137520 0.5939621 0.1158007
#Kappa was used to select the optimal model using the largest value.
#The final values used for the model were alpha = 1 and lambda = 0.01113752.