Search code examples
rfunctionr-mice

Issue with user-defined function for descriptive statistics from imputed data


I am trying to write a function that will calculate the mean and SD for a variable from a multiply imputed dataframe (mids). The code works fine outside of the function (as shown in two examples below), but will produce unreliable results when placed inside of a function. The function seems to keep giving results for bmi despite calling upon chl.

Any insight into this issue is appreciated. Eventually I would like this function to be able to calculate means and SDs for multiple variables at once (i.e., bmi and chl) but that is likely a separate question.

library(mice, warn.conflicts = FALSE)
data(nhanes)
imp <- mice(nhanes, m = 3, print = FALSE, seed = 123)

# workflow that i want to automate
# from here: https://bookdown.org/mwheymans/bookmi/data-analysis-after-multiple-imputation.html
# example 1 - bmi
impdat <- mice::complete(imp, action = "long", include = FALSE)
pool_mean <- with(impdat, by(impdat, .imp, function(x) c(mean(x$bmi), sd(x$bmi))))
result <- (Reduce("+", pool_mean)/length(pool_mean))
print(result)
#> [1] 27.117333  3.980506
rm(impdat, pool_mean, result)

# example 2 - chl
impdat <- mice::complete(imp, action = "long", include = FALSE)
pool_mean <- with(impdat, by(impdat, .imp, function(x) c(mean(x$chl), sd(x$chl))))
result <- (Reduce("+", pool_mean)/length(pool_mean))
print(result)
#> [1] 195.10667  39.95247
rm(impdat, pool_mean, result)

# automating the workflow
automate <- function(a, b) {
  impdat <- mice::complete(a, action = "long", include = FALSE)
  pool_mean <- with(impdat, by(impdat, .imp, function(x) c(mean(x$b), sd(x$b))))
  result <- (Reduce("+", pool_mean)/length(pool_mean))
  print(result)
}

automate(a=imp, b=bmi) # looks correct ... ?
#> [1] 27.117333  3.980506
automate(a=imp, b=chl) # no, it isn't
#> [1] 27.117333  3.980506

Solution

  • Two and a half problems here:

    1. b = bmi looks like an object bmi, which does not exist in our global environment. We can use deparse(susbtitute(x)) for this, to tell the function to wait with the evaluation.
    2. Accessor function $, see ?Extract: Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices
    automate <- function(a, b) {
      b <- deparse(substitute(b))
      impdat <- mice::complete(a, action = "long", include = FALSE)
      pool_mean <- with(impdat, by(impdat, .imp, function(x) c(mean(x[[b]]), sd(x[[b]]))))
      (Reduce("+", pool_mean)/length(pool_mean))
    }
    automate(a=imp, b=bmi)
    [1] 27.117333  3.980506
    automate(a=imp, b=chl)
    [1] 195.10667  39.95247
    

    To do this on a list of variables, we can rewrite it slightly to

    automate_list <- function(a, ...){
      impdat <- mice::complete(a, action = "long", include = FALSE)
      lapply(list(...), function(x){
        x = as.name(x)
        pool_mean <- with(impdat, by(impdat, .imp, function(y) c(mean(y[[x]]), sd(y[[x]]))))
        Reduce("+", pool_mean)/length(pool_mean)
      }) |>
        setNames(list(...))
    }
    
    automate_list(imp, "bmi", "chl")
    $bmi
    [1] 27.117333  3.980506
    
    $chl
    [1] 195.10667  39.95247