Search code examples
rfeature-selectionglmnetstability

Use penalty.factor in stabsel in R


I'd like to use stabsel on top of glmnet lasso for variable selection. I was following the examples on https://github.com/hofnerb/stabs and it works fine.

However, I'd also like to force including several variables. This can be achieve in glmnet with parameter 'penalty.factor', but passing this parameter in args.fitfun to stabsel results error (see below).

data("bodyfat", package = "TH.data")
pfac=c(0,0,0,1,0,1,1,1,1)
stab.glmnet <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                           fitfun = glmnet.lasso, cutoff = 0.75,
                           PFER = 1, args.fitfun=list(penalty.factor = pfac))
Error in res[[1]] : subscript out of bounds
In addition: Warning message:
In run_stabsel(fitter = fit_model, args.fitter = args.fitfun, n = n,      :
100 fold(s) encountered an error. Results are based on 0 folds only.
Original error message(s):
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x), y)
Error : Matrices must have same number of columns in rbind2(.Call(dense_to_Csparse, x [... truncated]

Any help would be much appreciated!


Solution

  • I figured it out. Since we are force including n variables by setting penalty.factor, we'd need to adjust the three stabsel variables (cutoff, PFER, q) to make sure we are allowing selecting at least n variables in each resampling.

    See https://github.com/hofnerb/stabs/blob/master/README.md and http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2010.00740.x/full for more details.