Search code examples
rsurveyimputationr-mice

Different return values of the sum of a row with imputed values using 'complete' (mice) and 'update' (survey)


I need to calculate the sum of some variables with imputed values. I did this with complete --> as.mids --> with --> do.call

I needed to do the same thing but in a survey context. Therefore, I did: update --> with --> MIcombine

The means of the variables calculated both ways do not match. Which one is correct?

You may check this different behavior in this toy database:

library(tidyverse)
library(mice)
library(mitools)
library(survey)

mydata <- structure(list(dis1 = c(NA, NA, 1, 0, 0, 1, 1, 1, 1, 0), 
                         dis2 = c(0, 1, 0, 1, NA, 1, 1, 1, 1, 0), 
                         dis3 = c(1, 1, 0, 0, NA, 1, 1, 1, 1, 0),
                         sex = c(0,0,0,1,0,1,1,1,1,0),
                         clus = c(1,1,1,1,1,2,2,2,2,2)), 
                    row.names = c(NA, 10L), 
                    class = c("tbl_df", "tbl", "data.frame") )

imp <- mice::mice(mydata, m = 5, seed = 237856)

# calculating numenf with mice::complete 
long <- mice::complete(imp, action = "long", include = TRUE)
long$numenf <- long$dis1 + long$dis2 + long$dis3
imp2 <- mice::as.mids(long)
res <- with(imp2, mean(numenf))
do.call(mean, res$analyses) # mean = 2.1

#calculating numenf with update (from survey)
imp1 <- mice::complete(imp)
imp2 <- mice::complete(imp, 2)
imp3 <- mice::complete(imp, 3)
imp4 <- mice::complete(imp, 4)
imp5 <- mice::complete(imp, 5)
listimp <- mitools::imputationList(list(imp1, imp2, imp3, imp4, imp5))                                             
clus <- survey::svydesign(id = ~clus, data = listimp)
clus <- stats::update(clus, numenf = dis1 + dis2 + dis3)
res <- with(clus, survey::svymean(~numenf))
summary(mitools::MIcombine(res)) # mean = 1.98

Solution

  • Answer

    Replace do.call(mean, res$analyses) with mean(unlist(res$analyses)).


    Rationale

    In the first code snippet, res$analyses is a list. When entering it into do.call, you are essentially calling:

    mean(res$analyses[1], res$analyses[2], res$analyses[3], res$analyses[4], res$analyses[5])
    

    mean takes the average of a vector in its first argument. The other arguments are not used properly (see ?mean). Hence, you're just getting 2.1 back, since that is the (mean of the) value of first analysis.


    We can make a vector out of the list by using unlist(res$analyses). Then, we can just feed it to mean as an argument:

    mean(unlist(res$analyses))