I am trying to write a function that will calculate the mean and SD for a variable from a multiply imputed dataframe (mids
). The code works fine outside of the function (as shown in two examples below), but will produce unreliable results when placed inside of a function. The function seems to keep giving results for bmi
despite calling upon chl
.
Any insight into this issue is appreciated. Eventually I would like this function to be able to calculate means and SDs for multiple variables at once (i.e., bmi
and chl
) but that is likely a separate question.
library(mice, warn.conflicts = FALSE)
data(nhanes)
imp <- mice(nhanes, m = 3, print = FALSE, seed = 123)
# workflow that i want to automate
# from here: https://bookdown.org/mwheymans/bookmi/data-analysis-after-multiple-imputation.html
# example 1 - bmi
impdat <- mice::complete(imp, action = "long", include = FALSE)
pool_mean <- with(impdat, by(impdat, .imp, function(x) c(mean(x$bmi), sd(x$bmi))))
result <- (Reduce("+", pool_mean)/length(pool_mean))
print(result)
#> [1] 27.117333 3.980506
rm(impdat, pool_mean, result)
# example 2 - chl
impdat <- mice::complete(imp, action = "long", include = FALSE)
pool_mean <- with(impdat, by(impdat, .imp, function(x) c(mean(x$chl), sd(x$chl))))
result <- (Reduce("+", pool_mean)/length(pool_mean))
print(result)
#> [1] 195.10667 39.95247
rm(impdat, pool_mean, result)
# automating the workflow
automate <- function(a, b) {
impdat <- mice::complete(a, action = "long", include = FALSE)
pool_mean <- with(impdat, by(impdat, .imp, function(x) c(mean(x$b), sd(x$b))))
result <- (Reduce("+", pool_mean)/length(pool_mean))
print(result)
}
automate(a=imp, b=bmi) # looks correct ... ?
#> [1] 27.117333 3.980506
automate(a=imp, b=chl) # no, it isn't
#> [1] 27.117333 3.980506
Two and a half problems here:
b = bmi
looks like an object bmi
, which does not exist in our global environment. We can use deparse(susbtitute(x))
for this, to tell the function to wait with the evaluation.$
, see ?Extract
: Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indicesautomate <- function(a, b) {
b <- deparse(substitute(b))
impdat <- mice::complete(a, action = "long", include = FALSE)
pool_mean <- with(impdat, by(impdat, .imp, function(x) c(mean(x[[b]]), sd(x[[b]]))))
(Reduce("+", pool_mean)/length(pool_mean))
}
automate(a=imp, b=bmi)
[1] 27.117333 3.980506
automate(a=imp, b=chl)
[1] 195.10667 39.95247
To do this on a list of variables, we can rewrite it slightly to
automate_list <- function(a, ...){
impdat <- mice::complete(a, action = "long", include = FALSE)
lapply(list(...), function(x){
x = as.name(x)
pool_mean <- with(impdat, by(impdat, .imp, function(y) c(mean(y[[x]]), sd(y[[x]]))))
Reduce("+", pool_mean)/length(pool_mean)
}) |>
setNames(list(...))
}
automate_list(imp, "bmi", "chl")
$bmi
[1] 27.117333 3.980506
$chl
[1] 195.10667 39.95247