Search code examples
rglmnetstatistics-bootstraplasso-regression

Error while using Adaptive Lasso Function with Boot


I currently have the below code. The function works when running outside of the boot() function, but when using the boot() function, it provides the error

Error in t.star[r, ] <- res[[r]] : number of items to replace is not a multiple of replacement length.

When I use the boot() function, lower values of R allow the function to run properly. Is there something I need to add to my function to ensure that I do not continue to receive this error?


    alassoOLS_ydot_n10_fn <- function(data,index){ #index is the bootstrap sample index
      x <- data[index,-1]
      y <- data[index,1]
      cv.out <- cv.glmnet(x,y,alpha=1,nfolds=10, penalty.factor = 1 / abs(best_ridge_coef.ydot.n10)) #alpha=1, lasso
      bestlam <- cv.out$lambda.min #the best lambda chosen by CV
      lasso.mod <- glmnet(x,y,alpha=1,lambda=bestlam, penalty.factor = 1 / abs(best_ridge_coef.ydot.n10))
      coef <- as.vector(coef(lasso.mod))[-1]
      coef_nonzero <- coef != 0
      ls.obj <- lm(y ~x[, coef_nonzero, drop = FALSE])
      ls_coef <- (ls.obj$coefficients)[-1]
      return(ls_coef)
    }

boot(ydot_matrix_n10,alassoOLS_ydot_n10_fn,R=500)


Solution

  • The length of the vector returned by alassoOLS_ydot_n10_fn is not constant but depends on the number of variables selected by glmnet.
    I modified your function as follows:

    alassoOLS_ydot_n10_fn <- function(data,index){ 
      x <- data[index,-1]
      y <- data[index,1]
      cv.out <- cv.glmnet(x,y,alpha=1,nfolds=10) 
      bestlam <- cv.out$lambda.min #the best lambda chosen by CV
      lasso.mod <- glmnet(x,y,alpha=1,lambda=bestlam, penalty.factor = 1/abs(best_ridge_coef.ydot.n10))
      coef <- as.vector(coef(lasso.mod))[-1]
      coef_nonzero <- coef != 0
      ls.obj <- lm(y ~x[, coef_nonzero, drop = FALSE])
      ls_coef <- (ls.obj$coefficients)[-1]
      # Generate a fixed-length vector fo OLS coefficients
      # The coefficients of variables not selected by glmnet were set to zero.
      vect_coef <- rep(0,length(coef_nonzero))
      vect_coef[coef_nonzero] <- ls_coef
      return(vect_coef)
    }
    

    Now the output is a fixed-length vector of coefficients.
    I set to 0 the coefficients of the covariates not selected by glmnet.
    (I don't know if this is correct from a statistical point of view in your investigation.
    My aim is only to show the source of the error message given by boot.)
    Now boot works with no errors. See the following example.

    set.seed(1)
    ydot_matrix_n10 <- matrix(runif(1000), ncol=10)
    best_ridge_coef.ydot.n10 <- 10
    boot(ydot_matrix_n10,alassoOLS_ydot_n10_fn,R=50)
    

    The output is reported below.

    ORDINARY NONPARAMETRIC BOOTSTRAP
    
    Bootstrap Statistics :
           original       bias    std. error
    t1*  0.00000000 -0.002330197  0.02319543
    t2*  0.13530886 -0.001906712  0.09889174
    t3* -0.19509877 -0.013020365  0.07251921
    t4* -0.01954785  0.015227018  0.09184750
    t5*  0.05600451  0.008896392  0.08729263
    t6*  0.12978757 -0.013795860  0.11320119
    t7*  0.06525111 -0.007208380  0.09703813
    t8*  0.09368079 -0.017343037  0.08947958
    t9* -0.09518469 -0.003352512  0.08575450