Search code examples
rpredictionr-caretglmnetlasso-regression

Using the type = "raw" option for the predict() function after repeated cross validation for logistic lasso regression returns empty vector


I used the caret and glmnet pacakges to run a lasso logistic regression using repeated cross validation to select the optimized minimum lambda.

glmnet.obj <- train(outcome ~ .,
                     data = df.train,
                     method = "glmnet",
                     metric = "ROC",
                     family = "binomial",
                     trControl = trainControl(
                                          method = "repeatedcv",
                                          repeats = 10,
                                          number = 10,
                                          summaryFunction = twoClassSummary,
                                          classProbs = TRUE,
                                          savePredictions = "all",
                                          selectionFunction = "best"))

After that, I get the best lambda and alpha:

best_lambda<- get_best_result(glmnet.obj)$lambda 
best_alpha<- get_best_result(glmnet.obj)$alpha 

Then I obtain the predicted probabilities for the test set:

pred_prob<- predict(glmnet.obj,s=best_lambda, alpha=best_alpha, type="prob", newx = x.test)

and then to get the predicted classes, which I intend to use in ConfusionMatrix:

pred_class<-predict(glmnet.obj,s=best_lambda, alpha=best_alpha, type="raw",newx=x.test)

But when I just run pred_class it returns NULL.

What could I be missing here?


Solution

  • You need to use newdata = as opposed to newx= because when you do predict(glmnet.obj), it is calling predict.train on the caret object.

    You did not provide one function, but I suppose it is rom this source:

    get_best_result = function(caret_fit) {
      best = which(rownames(caret_fit$results) == rownames(caret_fit$bestTune))
      best_result = caret_fit$results[best, ]
      rownames(best_result) = NULL
      best_result
    }
    

    Using an example data

    set.seed(111)
    df = data.frame(outcome = factor(sample(c("y","n"),100,replace=TRUE)),
    matrix(rnorm(1000),ncol=10))
    colnames(df.train)[-1] = paste0("col",1:10)
    
    df.train = df[1:70,]
    x.test = df[71:100,]
    

    And we run your model, then you can predict using the function:

    pred_class<-predict(glmnet.obj,type="raw",newdata=x.test)
    
    confusionMatrix(table(pred_class,x.test$outcome))
    Confusion Matrix and Statistics
    
              
    pred_class  n  y
             n  1  5
             y 11 13
    

    The arguments for lambda = and newx= comes from glmnet, you can potentially use it on glmnet.obj$finalModel , but you need to convert the data into a matrix, for example:

    predict(glmnet.obj$finalModel,s=best_lambda, alpha=best_alpha, 
    type="class",newx=as.matrix(x.test[,-1]))