I'm using caret package to create a LVQ model and select features on a dataset of 579 independent variable and 55 samples:
set.seed(123)
data=data
control <- trainControl(method="repeatedcv", number=5, repeats=10)
But when I run the command to train the model I get the following error:
model <- train(remission~., data=data, method="lvq", preProcess="scale", trControl=control, importance=T)
Error in seeds[[num_rs + 1L]] : subscript out of bounds
Can you suggest any solutions? Considering the number of variables I have, this seems the best way to find important features for my model. I even tried trimming my variables to 40 and 10, but I still get the same error.
The code to generate a grid runs into problems for a small dataset, you can look at the code under getModelInfo("lvq")$lvq$grid
, also answered by the author of caret. You can provide your own grid and also note importance=TRUE
is not an option for this:
library(multtest)
library(caret)
data(golub)
data = data.frame(t(golub))
data$cl=factor(golub.cl)
control <- trainControl(method="cv", number=5)
model <- train(cl~., data=data, method="lvq", preProcess="scale",trControl=control)
Error in seeds[[num_rs + 1L]] : subscript out of bounds
TG = expand.grid(k=1:3,size=seq(5,20,by=5))
model <- train(cl~., data=data, method="lvq", preProcess="scale",trControl=control,tuneGrid=TG)
Learning Vector Quantization
38 samples
3051 predictors
2 classes: '0', '1'
Pre-processing: scaled (3051)
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 31, 30, 31, 29, 31
Resampling results across tuning parameters:
k size Accuracy Kappa
1 5 0.9527778 0.8967033
1 10 1.0000000 1.0000000
1 15 0.9492063 0.8929766
1 20 0.9206349 0.8461538
2 5 1.0000000 1.0000000
2 10 0.9206349 0.8321070
2 15 0.9555556 0.8800000
2 20 0.9714286 0.9391304
3 5 0.9492063 0.8929766
3 10 0.9555556 0.9000000
3 15 0.9777778 0.9538462
3 20 0.9527778 0.8967033