Search code examples
rmachine-learningreplicationlasso-regressionreproducible-research

What is a simple to use library besides elastic net which can fit LASSO Regressions for output verification


I have run a series of LASSO Regressions in RStudio sequentially, one on each csv-formatted dataset within a file folder using the enet() function from the elastic net package in R with its lambda argument set to 0. However, as an important sanity check, I need to run the same number of LASSOs again on the same datasets with the same random seed set to ensure that the results will be identical so my portion of this research project will be ready for publication in 2023.

But here is the tricky part, I cannot use the caret library because the model performance measurement can be done objectively and exactly in this context because it is a research project where the datasets are synthetic and were randomly generated via Monte Carlo Simulations with known distributions, statistical properties, and even known true underlying structural equations. Thus, caret cannot be used here for obvious reasons if you are familiar with the caret library.

My current code, which does work, is of the following syntax:

set.seed(11)     # to ensure replicability
LASSO_fits <- lapply(datasets, function(i) 
               enet(x = as.matrix(select(i, starts_with("X"))), 
                    y = i$Y, lambda = 0, normalize = FALSE))
# This stores and prints out all of the regression 
# equation specifications selected by LASSO when called
set.seed(11)     # to ensure replicability
LASSO_Coeffs <- lapply(LASSO_fits, 
                       function(i) predict(i, x = as.matrix(select(i, starts_with("X"))), 
                                           s = 0.1, mode = "fraction", 
                                           type = "coefficients")[["coefficients"]])

Positive_Coeffs <- lapply(LASSO_Coeffs, function(i) i[i > 0])

IVs_Selected_by_LASSO <- lapply(LASSO_Coeffs, function(i) names(i[i > 0]))

Presumably, I could run my replication LASSO Regressions in such a manner that I could recycle everything about the last two functions, namely, Positive_Coeffs & IVs_Selected_by_LASSO, besides the name of the object in their initial arguments.


Solution

  • Use the glmnet() function from the glmnet package in something like the following way:

    LASSO_fits <- lapply(datasets, function(i) 
                   glmnet(x = as.matrix(select(i, starts_with("X"))), 
                          y = i$Y, alpha = 0))