Search code examples
rimputationmulti-levelr-mice

mice: splitting imputed data for further analysis


I am using the mice package (version 3.3.0) to perform multiple imputations (MI). The MI procedure works fine. For further analysis I would like to separate/split/subset imputed data by the variable ‘group’ like indicated in the example below.

library(mice)

d <- nhanes
d$group <- as.factor(c(rep("A", 13), rep("B", 12)))
str(d)

imp <- mice(d)

fit <- with(imp, lm(bmi ~ age + chl + group))
est <- pool(fit)
summary(est, digits=3)

# I would like to do is
imp.A <- imp[which(group=="A")]
imp.B <- imp[which(group=="B")]

fit.A <- with(imp.A, lm(bmi ~ age + chl))
fit.B <- with(imp.A, lm(bmi ~ age + chl))

Is it possible to split imputed data somehow?


Solution

  • I think this code can be used to achieve what you are asking for

    First create a long format version of all your datasets:

    d.long <- mice::complete(imp,"long",include = T)
    

    Next perform your grouping as normal using base R

    d.long.A <- d.long[which(d.long$group == 'A'),]
    d.long.B <- d.long[which(d.long$group == 'B'),]
    

    Then change these back to mids objects, so you can perform mice operations

    imp.A <- as.mids(d.long.A)
    imp.B <- as.mids(d.long.B)
    

    You'll probably get a warning message because group is now a constant.

    Warning message:
    Number of logged events: 1
    imp.A$loggedEvents
      it im dep     meth      out
    1  0  0     constant group
    

    But this shouldn't be a problem, it's just mice telling you there is a constant value in your dataset. Finally you can use your new subsets for your regression models

    fit.A <- with(imp.A, lm(bmi ~ age + chl))
    fit.B <- with(imp.B, lm(bmi ~ age + chl))
    

    use pool to get the pooled results. I'm not entirely sure why you want to do this instead of just including the group variable in your regression model, but I assume you have a reason for this. Hope this helps!