I have 3 data sets and wish to run the same linear model on all of them, store the coefficient and its upper and lower confidence limits.
set.seed(1)
school1 = data.frame(student = sample(c(1:100), 100, r = T),
score = runif(100))
school2 = data.frame(student = sample(c(1:100), 100, r = T),
score = runif(100))
school3 = data.frame(student = sample(c(1:100), 100, r = T),
score = runif(100))
schools = list('school1', 'school2', 'school3')
storage <- vector('list', length(schools))
for(i in seq_along(schools)){
tmpdat <- schools[[i]]
tmp <- lm(score ~ x1, data = tmpdat)
storage[[i]] <- summary(tmp)$coef[1]
}
I wish to make WANT which stores all the information and also the name of dataset:
WANT = data.frame(data = c('school1', 'school2', 'school3'),
coef = c(0,0,0),
coefLL = c(0,0,0),
coefUL=c(0,0,0))
but I am struggling,, I loop over the datasets but do not know how to store all the information I need....Also I have this for like 1000 data sets so the most efficient way possible is the best thank you so much
There are a few odd things about your setup - you don't have a list of school data sets, you have a list of school names? By "the coefficient" do you mean you're only interested in the slope (throwing away the intercept?) Why do you have a predictor variable x1
in your model when it's not in your data ... ?
library(broom)
library(tidyverse)
schoolnames <- c('school1', 'school2', 'school3')
schools <- mget(schoolnames)
res <- vector(length = 3, mode = "list")
names(res) <- schoolnames
for(i in seq_along(schools)){
tmp <- lm(score ~ student, data = schools[[i]])
res[[i]] <- (tidy(tmp, conf.int = TRUE)
|> filter(term == "student")
|> select(estimate, conf.low, conf.high)
)
}
WANT <- bind_rows(res, .id = "school")
You could also use purrr::map()
for this ...
If for some reason you wanted to do this in a lower-tech way, you could:
res <- data.frame(schools = schoolnames, est = rep(NA,3),
lwr = rep(NA,3), upr = rep(NA,3))
for(i in seq_along(schools)){
tmp <- lm(score ~ student, data = schools[[i]])
## use element 2/row 2 to pick out the slope coefficient/CIs
res[i,1] <- coef(tmp)[2]
res[i,2] <- confint(tmp)[2,1] ## lower CI in column 1
res[i,3] <- confint(tmp)[2,2]
}