Search code examples
rsimulationlme4

Calculating lmer() model within a simulation study: Error in coefficients(): subscript out of bounds


I tried to simulate multilevel data for a simulation study where I systematically vary the ICC. In the following example dgp is the function to simulate data, onesimulation is just one analysis, and manysimulations is where I go through all the simulations-conditions.

The functions seem to work, yet every now and then I get the following Error and the function manysimulations is aborted.

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
boundary (singular) fit: see ?isSingular
Error in coefficients(summary(model))["zj", c("Estimate", "Std. Error",  : 
  subscript out of bounds
Called from: t(coefficients(summary(model))["zj", c("Estimate", 
    "Std. Error", "Pr(>|t|)")])
Browse[1]>

Now coefficients(summary(model))["zj", c("Estimate","Std. Error", "Pr(>|t|)")] is just where I access the coefficients of the model. Yet it seems sometimes something goes wrong.

Following the code of the simulation:

niter <- 50
#numbers of iterations
ni <- 20
#numbers of student per class
nj <- 10
#numbers of clusters/classes

ICC <- seq(0,.9,.1)
#starting with 0, then going up .1 stepwise till .9, thus generating 10 values

# what is the random intercept standard deviation given icc? 
RI_sd <- sqrt(1 / (1-ICC) - 1) 
# RI_sd


gamma00 <- 0
#is the mean value of the L1 dependent variable, controlling for the L2 predijtor Wj
gamma01 <- 0
#is the effect (slope) of the L2 predictor
gamma10 <- 0
#is the mean value of Level1 slope, controlling for the L2 predictor Wj


sigma2 = 1
#level 1 residual with sigma2 = 1

# design matrix
design_grid <- expand.grid(ni = ni,
                      nj = nj, 
                      RI_sd = RI_sd, 
                      replication = 1:niter)



dgp <- function(ni,nj, RI_sd){
  
  dgp_grid <- expand.grid(
    ni = 1:ni,
    nj = 1:nj,
    xij = NA,
    zj = NA,
    yij = NA,
    Rij = NA,
    U0j = NA
  )
  
  dgp_grid$zj <- rep(rbinom(nj,1,0.5), each = length(1:ni))
  #create a random factorial level 2 predictor, same value for the whole cluster
  
  dgp_grid$U0j <- rep(rnorm(nj, mean = 0, sd = RI_sd), each = length(ni))
  #create level 2 residual
  
  dgp_grid$Rij <- rnorm(nj*nj, mean = 0, sd = sqrt(sigma2))
  # create level 1 residual with sigma2 = 1
  
  dgp_grid$xij <- rbinom(nj*nj,1,0.5)
  # create level 1 explanatory/predictor variable (draw from standard normal)
  
  
  dgp_grid$yij <-  gamma00 + gamma01 * dgp_grid$xij + gamma10 * dgp_grid$zj + dgp_grid$U0j + dgp_grid$Rij
  #creating Yij
  
  return(dgp_grid)
}

test_dgp<-dgp(ni = 10, nj = 5, RI_sd = 5)
#test if function works



onesimulation <- function(ni,nj,RI_sd){
  
  dgp_grid<-dgp(ni,nj,RI_sd)
  #creating the data
  
  model <- lmer(yij ~ xij + zj + (1|nj), data = dgp_grid)
  
  coefs_x <- t(coefficients(summary(model))["xij", c("Estimate", "Std. Error", "Pr(>|t|)")])
  coefs_z <- t(coefficients(summary(model))["zj", c("Estimate", "Std. Error", "Pr(>|t|)")])
  res <- cbind(data.frame(var = "x", coefs_x),
               data.frame(var = "z", coefs_z))
  
  
  colnames(res)[2] ="estimate_x"
  colnames(res)[3] ="std.err_x"
  colnames(res)[4] ="p_x"
  colnames(res)[6] ="estimate_z"
  colnames(res)[7] ="std.err_z"
  colnames(res)[8] ="p_z"
  
  return(res)
  
}

test_onesimulation<-onesimulation(ni = 10, nj = 5, RI_sd = 5)
#test if the function works

manysimulations <- function(ni, nj, RI_sd){
  
  sim_results <- expand.grid(ni = ni,
                             nj = nj, 
                             RI_sd = RI_sd, 
                             replication = 1:niter,
                             estimate_x = NA,
                             std.err_x = NA,
                             p_x = NA,
                             estimate_z = NA,
                             std.err_z = NA,
                             p_z = NA
                             )
  
  
  for(i in 1:nrow(sim_results)){
    res<-onesimulation(ni = sim_results$ni[i], nj = sim_results$nj[i], RI_sd = sim_results$RI_sd[i])
    # analysisgrid[i,2:ncol(analysisgrid)] <- res
    # print(res)
    sim_results$estimate_x[i] <- res$estimate_x
    sim_results$std.err_x[i] <- res$std.err_x
    sim_results$p_x[i] <- res$p_x
    sim_results$estimate_z[i] <- res$estimate_z
    sim_results$std.err_z[i] <- res$std.err_z
    sim_results$p_z[i] <- res$p_z
  }

  return(sim_results)
}

sim_results <- manysimulations(ni = ni, nj = nj, RI_sd = RI_sd)

I don't know why that is, nor do I know when the error occurs.

It seems that within coefficients(), something can be assigned...

Have any of you already encountered that error?

Thanks!


Solution

  • The way to avoid this problem without needing tryCatch() is to detect the problem upstream, in your case something like:

    cc <- coefficients(summary(model))
    if (!"zj" %in% rownames(cc)) {
      coefs_z <- rep(NA, 3)  ## I *think* this is the right shape/length?
    } else {
      coefs_z <- t(cc["zj", c("Estimate", "Std. Error", "Pr(>|t|)")])
    }
    

    You may find that broom.mixed::tidy() gives you results in a slightly more convenient format ...