I suspect that there are parallels with other questions but I haven't been able to find a combination which works in this situation.
In essence I am trying to use a for loop to do multiple imputation (I am aware that there are packages which can do the whole analysis but I need to extract interim stages).
Starting with a data set like this, where y is the outcome, x has the missing values and the imputed missing values are in a1-3.
dat <- data.frame(y = c(1,0,0,1,0),
x= c(1,NA,2,NA,1),
a1 = c(NA,1,NA,2,NA),
a2 = c(NA,1,NA,1,NA),
a3 = c(NA,2,NA,1,NA))
I thought that I could loop over a1-3, coalescing the values into the x variable and then running the anaysis but this doesn't seem to work. I've tried:
for (i in 1:3) {
dat$x <- coalesce(dat$x,dat$a[i])
z <- glm(y ~ x, data=dat, family="binomial")
res <- summary(z)$coefficient
print(res)
}
But the first line dat$x <- coalesce(dat$x,dat$a[i])
clearly isn't doing what I think it should as all three "res" are the same and if I print dat$x after this line it hasn't been updated with the values from A1-3.
I'd thought that using mutate from dplyr might work as in:
for (i in 1:3) {
dat %>%
mutate(x = coalesce(x,a[i]))
z <- glm(y ~ x, data=dat, family="binomial")
res <- summary(z)$coefficient
print(res)
}
But that gives the error "object 'a' not found". I've attempted writing a function and then using apply instead of trying to loop but didn't get any further with that approach. Based on similar questions, I guess this is all to do with how the loop is using the column names but I've not managed to implement any of the solutions successfully.
Any hints gratefully received.
You can get each of your models like this:
r = dat %>% reframe(across(a1:a3,\(a) {
list(glm(y~coalesce(a,x), family="binomial"))
}))
Output:
# A tibble: 1 × 3
a1 a2 a3
<list> <list> <list>
1 <glm> <glm> <glm>
You can then manipulate these in further downstream steps, depending on what you want to extract, how you want to store it, etc [update your question with more specifics about what kind of output you actually want]
If it is just printing to console that you want, you can actually follow your original approach as well, but revise in this manner:
for (i in 1:3) {
dat$tmp <- coalesce(dat[[paste0("a",i)]],dat[['x']])
z <- glm(y ~ tmp, data=dat, family="binomial")
res <- summary(z)$coefficient
print(res)
}
Output:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3862944 2.828427 -0.4901291 0.6240425
tmp 0.6931472 1.870829 0.3705028 0.7110079
Estimate Std. Error z value Pr(>|z|)
(Intercept) 18.56607 6522.639 0.002846404 0.9977289
tmp -18.56607 6522.639 -0.002846405 0.9977289
Estimate Std. Error z value Pr(>|z|)
(Intercept) 20.95236 7604.236 0.002755354 0.9978015
tmp -20.25922 7604.236 -0.002664201 0.9978743