I need to calculate the sum of some variables with imputed values. I did this with complete
--> as.mids
--> with
--> do.call
I needed to do the same thing but in a survey context. Therefore, I did: update
--> with
--> MIcombine
The means of the variables calculated both ways do not match. Which one is correct?
You may check this different behavior in this toy database:
library(tidyverse)
library(mice)
library(mitools)
library(survey)
mydata <- structure(list(dis1 = c(NA, NA, 1, 0, 0, 1, 1, 1, 1, 0),
dis2 = c(0, 1, 0, 1, NA, 1, 1, 1, 1, 0),
dis3 = c(1, 1, 0, 0, NA, 1, 1, 1, 1, 0),
sex = c(0,0,0,1,0,1,1,1,1,0),
clus = c(1,1,1,1,1,2,2,2,2,2)),
row.names = c(NA, 10L),
class = c("tbl_df", "tbl", "data.frame") )
imp <- mice::mice(mydata, m = 5, seed = 237856)
# calculating numenf with mice::complete
long <- mice::complete(imp, action = "long", include = TRUE)
long$numenf <- long$dis1 + long$dis2 + long$dis3
imp2 <- mice::as.mids(long)
res <- with(imp2, mean(numenf))
do.call(mean, res$analyses) # mean = 2.1
#calculating numenf with update (from survey)
imp1 <- mice::complete(imp)
imp2 <- mice::complete(imp, 2)
imp3 <- mice::complete(imp, 3)
imp4 <- mice::complete(imp, 4)
imp5 <- mice::complete(imp, 5)
listimp <- mitools::imputationList(list(imp1, imp2, imp3, imp4, imp5))
clus <- survey::svydesign(id = ~clus, data = listimp)
clus <- stats::update(clus, numenf = dis1 + dis2 + dis3)
res <- with(clus, survey::svymean(~numenf))
summary(mitools::MIcombine(res)) # mean = 1.98
Answer
Replace do.call(mean, res$analyses)
with mean(unlist(res$analyses))
.
Rationale
In the first code snippet, res$analyses
is a list
. When entering it into do.call
, you are essentially calling:
mean(res$analyses[1], res$analyses[2], res$analyses[3], res$analyses[4], res$analyses[5])
mean
takes the average of a vector
in its first argument. The other arguments are not used properly (see ?mean
). Hence, you're just getting 2.1
back, since that is the (mean of the) value of first analysis.
We can make a vector
out of the list
by using unlist(res$analyses)
. Then, we can just feed it to mean
as an argument:
mean(unlist(res$analyses))