Search code examples
rlogistic-regressioncross-validationglmnetlasso-regression

Error with cross validation and lasso regularization for logistic regression


I want to create a 5-fold CV logistic regression model with lasso regularization, but I get this error message: Something is wrong; all the RMSE metric values are missing:.

I started with logistic regression with lasso regularization by setting alpha=1. This works. I expanded from this example.

# Load data set
data("mtcars")

# Prepare data set 
x   <- model.matrix(~.-1, data= mtcars[,-1])
mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
y   <- factor(mpg, labels = c('notEfficient', 'efficient'))

#find minimum coefficient
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)

#logistic regression with lasso regularization
logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
                         lambda = mod_cv$lambda.min)

I read that the glmnet function already does 10-fold cv. But I want to use 5-fold cv. So when I add that modification using n_folds to cv.glmnet, I can't find the minimum coefficient nor can I just make the model when modifying trControl.

#find minimum coefficient by adding 5-fold cv
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, n_folds=5)


#Error in glmnet(x, y, weights = weights, offset = offset, #lambda = lambda,  : 
#  unused argument (n_folds = 5)

#logistic regression with 5-fold cv
    # define training control
    train_control <- trainControl(method = "cv", number = 5)

# train the model with 5-fold cv
model <- train(x, y, trControl = train_control, method = "glm", family="binomial", alpha=1)

#Something is wrong; all the Accuracy metric values are missing:
#    Accuracy       Kappa    
#Min.   : NA   Min.   : NA  
# 1st Qu.: NA   1st Qu.: NA  
# Median : NA   Median : NA  
# Mean   :NaN   Mean   :NaN  
# 3rd Qu.: NA   3rd Qu.: NA  
# Max.   : NA   Max.   : NA  
 # NA's   :1     NA's   :1  

Why does the error arise when I add 5-fold cv?


Solution

  • There are 2 problems in your code: 1) the n_folds argument in cv.glmnet is actually called nfolds and 2) the train function takes no alpha argument. If you fix these your code works:

    # Load data set
    data("mtcars")
    library(glmnet)
    library(caret)
    
    # Prepare data set 
    x   <- model.matrix(~.-1, data= mtcars[,-1])
    mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
    y   <- factor(mpg, labels = c('notEfficient', 'efficient'))
    
    #find minimum coefficient
    mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)
    
    #logistic regression with lasso regularization
    logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
                             lambda = mod_cv$lambda.min)
    
    
    
    #find minimum coefficient by adding 5-fold cv
    mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, nfolds=5)
    
    
    #logistic regression with 5-fold cv
    # define training control
    train_control <- trainControl(method = "cv", number = 5)
    
    # train the model with 5-fold cv
    model <- train(x, y, trControl = train_control, method = "glm", family="binomial")
    model$results
    #>  parameter  Accuracy     Kappa AccuracySD   KappaSD
    #>1      none 0.8742857 0.7362213 0.07450517 0.1644257