I am using the mice package (version 3.3.0) to perform multiple imputations (MI). The MI procedure works fine. For further analysis I would like to separate/split/subset imputed data by the variable ‘group’ like indicated in the example below.
library(mice)
d <- nhanes
d$group <- as.factor(c(rep("A", 13), rep("B", 12)))
str(d)
imp <- mice(d)
fit <- with(imp, lm(bmi ~ age + chl + group))
est <- pool(fit)
summary(est, digits=3)
# I would like to do is
imp.A <- imp[which(group=="A")]
imp.B <- imp[which(group=="B")]
fit.A <- with(imp.A, lm(bmi ~ age + chl))
fit.B <- with(imp.A, lm(bmi ~ age + chl))
Is it possible to split imputed data somehow?
I think this code can be used to achieve what you are asking for
First create a long format version of all your datasets:
d.long <- mice::complete(imp,"long",include = T)
Next perform your grouping as normal using base R
d.long.A <- d.long[which(d.long$group == 'A'),]
d.long.B <- d.long[which(d.long$group == 'B'),]
Then change these back to mids
objects, so you can perform mice
operations
imp.A <- as.mids(d.long.A)
imp.B <- as.mids(d.long.B)
You'll probably get a warning message because group is now a constant.
Warning message:
Number of logged events: 1
imp.A$loggedEvents
it im dep meth out
1 0 0 constant group
But this shouldn't be a problem, it's just mice
telling you there is a constant value in your dataset. Finally you can use your new subsets for your regression models
fit.A <- with(imp.A, lm(bmi ~ age + chl))
fit.B <- with(imp.B, lm(bmi ~ age + chl))
use pool
to get the pooled results. I'm not entirely sure why you want to do this instead of just including the group variable in your regression model, but I assume you have a reason for this. Hope this helps!