I tried to simulate multilevel data for a simulation study where I systematically vary the ICC.
In the following example dgp
is the function to simulate data, onesimulation
is just one analysis, and manysimulations
is where I go through all the simulations-conditions.
The functions seem to work, yet every now and then I get the following Error and the function manysimulations
is aborted.
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
boundary (singular) fit: see ?isSingular
Error in coefficients(summary(model))["zj", c("Estimate", "Std. Error", :
subscript out of bounds
Called from: t(coefficients(summary(model))["zj", c("Estimate",
"Std. Error", "Pr(>|t|)")])
Browse[1]>
Now coefficients(summary(model))["zj", c("Estimate","Std. Error", "Pr(>|t|)")]
is just where I access the coefficients of the model. Yet it seems sometimes something goes wrong.
Following the code of the simulation:
niter <- 50
#numbers of iterations
ni <- 20
#numbers of student per class
nj <- 10
#numbers of clusters/classes
ICC <- seq(0,.9,.1)
#starting with 0, then going up .1 stepwise till .9, thus generating 10 values
# what is the random intercept standard deviation given icc?
RI_sd <- sqrt(1 / (1-ICC) - 1)
# RI_sd
gamma00 <- 0
#is the mean value of the L1 dependent variable, controlling for the L2 predijtor Wj
gamma01 <- 0
#is the effect (slope) of the L2 predictor
gamma10 <- 0
#is the mean value of Level1 slope, controlling for the L2 predictor Wj
sigma2 = 1
#level 1 residual with sigma2 = 1
# design matrix
design_grid <- expand.grid(ni = ni,
nj = nj,
RI_sd = RI_sd,
replication = 1:niter)
dgp <- function(ni,nj, RI_sd){
dgp_grid <- expand.grid(
ni = 1:ni,
nj = 1:nj,
xij = NA,
zj = NA,
yij = NA,
Rij = NA,
U0j = NA
)
dgp_grid$zj <- rep(rbinom(nj,1,0.5), each = length(1:ni))
#create a random factorial level 2 predictor, same value for the whole cluster
dgp_grid$U0j <- rep(rnorm(nj, mean = 0, sd = RI_sd), each = length(ni))
#create level 2 residual
dgp_grid$Rij <- rnorm(nj*nj, mean = 0, sd = sqrt(sigma2))
# create level 1 residual with sigma2 = 1
dgp_grid$xij <- rbinom(nj*nj,1,0.5)
# create level 1 explanatory/predictor variable (draw from standard normal)
dgp_grid$yij <- gamma00 + gamma01 * dgp_grid$xij + gamma10 * dgp_grid$zj + dgp_grid$U0j + dgp_grid$Rij
#creating Yij
return(dgp_grid)
}
test_dgp<-dgp(ni = 10, nj = 5, RI_sd = 5)
#test if function works
onesimulation <- function(ni,nj,RI_sd){
dgp_grid<-dgp(ni,nj,RI_sd)
#creating the data
model <- lmer(yij ~ xij + zj + (1|nj), data = dgp_grid)
coefs_x <- t(coefficients(summary(model))["xij", c("Estimate", "Std. Error", "Pr(>|t|)")])
coefs_z <- t(coefficients(summary(model))["zj", c("Estimate", "Std. Error", "Pr(>|t|)")])
res <- cbind(data.frame(var = "x", coefs_x),
data.frame(var = "z", coefs_z))
colnames(res)[2] ="estimate_x"
colnames(res)[3] ="std.err_x"
colnames(res)[4] ="p_x"
colnames(res)[6] ="estimate_z"
colnames(res)[7] ="std.err_z"
colnames(res)[8] ="p_z"
return(res)
}
test_onesimulation<-onesimulation(ni = 10, nj = 5, RI_sd = 5)
#test if the function works
manysimulations <- function(ni, nj, RI_sd){
sim_results <- expand.grid(ni = ni,
nj = nj,
RI_sd = RI_sd,
replication = 1:niter,
estimate_x = NA,
std.err_x = NA,
p_x = NA,
estimate_z = NA,
std.err_z = NA,
p_z = NA
)
for(i in 1:nrow(sim_results)){
res<-onesimulation(ni = sim_results$ni[i], nj = sim_results$nj[i], RI_sd = sim_results$RI_sd[i])
# analysisgrid[i,2:ncol(analysisgrid)] <- res
# print(res)
sim_results$estimate_x[i] <- res$estimate_x
sim_results$std.err_x[i] <- res$std.err_x
sim_results$p_x[i] <- res$p_x
sim_results$estimate_z[i] <- res$estimate_z
sim_results$std.err_z[i] <- res$std.err_z
sim_results$p_z[i] <- res$p_z
}
return(sim_results)
}
sim_results <- manysimulations(ni = ni, nj = nj, RI_sd = RI_sd)
I don't know why that is, nor do I know when the error occurs.
It seems that within coefficients()
, something can be assigned...
Have any of you already encountered that error?
Thanks!
The way to avoid this problem without needing tryCatch()
is to detect the problem upstream, in your case something like:
cc <- coefficients(summary(model))
if (!"zj" %in% rownames(cc)) {
coefs_z <- rep(NA, 3) ## I *think* this is the right shape/length?
} else {
coefs_z <- t(cc["zj", c("Estimate", "Std. Error", "Pr(>|t|)")])
}
You may find that broom.mixed::tidy()
gives you results in a slightly more convenient format ...