Search code examples
rloopsimputation

Using a for loop to run multiple imputation in R


I suspect that there are parallels with other questions but I haven't been able to find a combination which works in this situation.

In essence I am trying to use a for loop to do multiple imputation (I am aware that there are packages which can do the whole analysis but I need to extract interim stages).

Starting with a data set like this, where y is the outcome, x has the missing values and the imputed missing values are in a1-3.

dat <- data.frame(y = c(1,0,0,1,0),
                  x= c(1,NA,2,NA,1),
                 a1 = c(NA,1,NA,2,NA),
                 a2 = c(NA,1,NA,1,NA),
                 a3 = c(NA,2,NA,1,NA))

I thought that I could loop over a1-3, coalescing the values into the x variable and then running the anaysis but this doesn't seem to work. I've tried:

for (i in 1:3) {
  dat$x <- coalesce(dat$x,dat$a[i])
  z <- glm(y ~ x, data=dat, family="binomial")
  res  <- summary(z)$coefficient
  print(res)
}

But the first line dat$x <- coalesce(dat$x,dat$a[i]) clearly isn't doing what I think it should as all three "res" are the same and if I print dat$x after this line it hasn't been updated with the values from A1-3.

I'd thought that using mutate from dplyr might work as in:

for (i in 1:3) {
  dat %>%
  mutate(x = coalesce(x,a[i]))
  z <- glm(y ~ x, data=dat, family="binomial")
  res  <- summary(z)$coefficient
  print(res)
}

But that gives the error "object 'a' not found". I've attempted writing a function and then using apply instead of trying to loop but didn't get any further with that approach. Based on similar questions, I guess this is all to do with how the loop is using the column names but I've not managed to implement any of the solutions successfully.

Any hints gratefully received.


Solution

  • You can get each of your models like this:

    r = dat %>% reframe(across(a1:a3,\(a) {
      list(glm(y~coalesce(a,x), family="binomial"))
    }))
    

    Output:

    # A tibble: 1 × 3
      a1     a2     a3    
      <list> <list> <list>
    1 <glm>  <glm>  <glm> 
    

    You can then manipulate these in further downstream steps, depending on what you want to extract, how you want to store it, etc [update your question with more specifics about what kind of output you actually want]

    If it is just printing to console that you want, you can actually follow your original approach as well, but revise in this manner:

    for (i in 1:3) {
      dat$tmp <- coalesce(dat[[paste0("a",i)]],dat[['x']])
      z <- glm(y ~ tmp, data=dat, family="binomial")
      res  <- summary(z)$coefficient
      print(res)
    }
    

    Output:

                  Estimate Std. Error    z value  Pr(>|z|)
    (Intercept) -1.3862944   2.828427 -0.4901291 0.6240425
    tmp          0.6931472   1.870829  0.3705028 0.7110079
                 Estimate Std. Error      z value  Pr(>|z|)
    (Intercept)  18.56607   6522.639  0.002846404 0.9977289
    tmp         -18.56607   6522.639 -0.002846405 0.9977289
                 Estimate Std. Error      z value  Pr(>|z|)
    (Intercept)  20.95236   7604.236  0.002755354 0.9978015
    tmp         -20.25922   7604.236 -0.002664201 0.9978743