I've resampled with replacement a data set 1000 times and now want to fit three models to each of these 1000 data sets and bag their AIC scores. The end goal of this procedure is to obtain mean AIC scores for each model across all models along with their 95% confidence intervals. The code below is faulty and I don't know where I am making a mistake. What happens is that the final matrix only contains the AIC score vectors from the first few iterations (i.e. not all 1000). Is there an error in the way I initialized the main matrix or the vectors at each iteration? Or maybe my row-adding procedure is flawed? Or, if the code is correct, could it be something with the datasets that are being fed into this code? If the latter is the case, then why am I not getting an error when the code reads in these datasets and just skips them? I've been struggling with this for days and am very puzzled, so any help would be appreciated.
require(lme4)
require(lmerTest)
# initializing an empty matrix for storing each vector of AIC scores from each iteration
# the matrix has width 3 because three models are fitted at each iteration
AIC.scores = data.frame(matrix(, nrow = 0, ncol = 3))
#fit regression models to each of 1000 datasets
for(iter in 1:1000){
#retrieving the data set, named accordingly, for the current iteration
data = read.csv(paste("data_set_", iter,".csv", sep=""), header=TRUE)
#initializing vector of AICs from models in current iteration
AIC.score = vector(mode="numeric", length=3)
mod1 = lmer(RT.log ~ crit.var1.log.std +
(1|Subject) +
(1|Item),
data = data,
REML=FALSE)
AIC.score[1] = summary(mod1)$AIC[1]
mod2 = lmer(RT.log ~ crit.var2.log.std +
(1|Subject) +
(1|Item),
data = data,
REML=FALSE)
AIC.score[2] = summary(mod2)$AIC[1]
mod3 = lmer(RT.log ~ crit.var3.log.std +
(1|Subject) +
(1|Item),
data = data,
REML=FALSE)
AIC.score[3] = summary(mod3)$AIC[1]
#adding vector of AICs scores from current iteration to main matrix
AIC.scores = rbind(AIC.scores, t(AIC.score))
cat("bagging iteration", iter, "completed!\n")
}
#renaming column names in AIC score matrix
colnames(AIC.scores) = c("model1", "model2", "model3")
# function for calculating mean AIC and 95% C.I.s for each model across all iterations
norm.interval = function(data, z=1.96) {
mean = mean(data)
variance = var(data)
sd = sqrt(variance/length(data))
c(mean, mean - z * sd, mean + z * sd)
}
for (i in 1:3) {
cat("The mean, lCI, uCI for model", i, "are:", norm.interval(AIC.scores[,i]), "\n")
}
Shot in the dark without knowing specifically what your lmer models are or what you data is.
Read in all your data.frame as a single list:
all.data <- lapply(paste0("data_set_", 1:1000, ".csv"), read.csv, header=TRUE)
Churn out the lmer formulae in text form, based on the list of data above:
all.form <- lapply(paste0("all.data[[", 1:1000, "]]"), function(x) list(
mod1 = paste0("lmer(RT.log ~ crit.var1.log.std + (1|Subject) + (1|Item), REML=FALSE, data =", x, ")"),
mod2 = paste0("lmer(RT.log ~ crit.var2.log.std + (1|Subject) + (1|Item), REML=FALSE, data =", x, ")"),
mod3 = paste0("lmer(RT.log ~ crit.var3.log.std + (1|Subject) + (1|Item), REML=FALSE, data =", x, ")")
))
Execute the lmer formula and write down the models into a single list:
all.lmer.mod <- lapply(all.form. function(x) lapply(x, function(y) eval(parse(text=y))))
Extract all AIC value of lmer models:
all.AIC <- lapply(all.lmer.mod, function(x) lapply(x, AIC))
Be warned that the process will take long if you have large data.frame and you have many subjects and items level. Test it out by changing 1:1000
on top to say 1:2
first.