When I pass a single numeric variable as an independent variable to glmnet in caret, I get an error message saying "x should be a matrix with 2 or more columns", however when I pass a single factor variable then the train function performs as expected. Adding a factor variable to the single numeric variable also works as expected. Why is this? It is very problematic so far. I know that with glmnet you need to use a matrix and not a data frame, however caret should take care of this transformation, as it clearly does for the factor variable. Also, I need to be able to consistently implement my analysis within the caret framework, and I need my data to be as a data frame. Here is a sample, please ignore the warnings message resulting from too few observations which is not relevant for this problem.
Any help would be much appreciated as I am going crazy!
df <- structure(list(Y = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("No",
"Yes"), class = "factor"), A = c("Yes", "Yes", "No", "No", "No",
"No", "No", "No", "No", "Yes", "No", "No", "Yes", "Yes", "N",
"No", "No", "No", "No", "No"), B = c(30, 6, 12, 12, 12, 12, 12,
4, 12, 32, 12, 12, 4, 24, 8, 12, 15, 6, 12, 12), C = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L), .Label = c("A", "B"), class = "factor")), .Names = c("Y",
"A", "B", "C"), row.names = c(NA, 20L), class = "data.frame")
# set up the grid
tuneGrid <- expand.grid(.alpha = seq(0, 1, 0.05), .lambda = seq(0, 2, 0.05))
## 10-fold CV ##
fitControl <- trainControl(method = 'cv', number = 10, classProbs = TRUE, summaryFunction = twoClassSummary)
#works with a single factor variable (ignore warnings based on small sample size)
train(Y ~ A, data=df[c("Y", "A")], method="glmnet",
family="binomial", trControl = fitControl, tuneGrid = tuneGrid, metric = "ROC")
#returns and error message when a single numeric independent variable is passed
train(Y ~ B, data=df[c("Y", "B")], method="glmnet",
family="binomial", trControl = fitControl, tuneGrid = tuneGrid, metric = "ROC")
#works when a factor variable is added to the numeric variable (ignore warnings based on small sample size)
train(Y ~ A + C, data=df[c("Y", "A", "C")], method="glmnet",
family="binomial", trControl = fitControl, tuneGrid = tuneGrid, metric = "ROC")
Try using this trick:
df$ones <- rep(1, nrow(df))
train(Y ~ ones+B, data=df[c("Y", "B", "ones")], method="glmnet",
family="binomial", trControl = fitControl, tuneGrid = tuneGrid, metric = "ROC")