I wish to create a variable which is computed from values in two other variables from an imputed dataset and I was wondering if there's a way to achieve this?
e.g. if I wanted to create a new variable var_new to the nhanes dataset which I've run 16 m estimates on (in the mice package below), which was equal to the value of chl - bmi, is there a way to achieve this?
library(mice)
aux_vart <- mice::quickpred(
nhanes,
mincor = 0.1
)
imp <- mice::mice(nhanes, pred = aux_vart, m = 16, meth = "pmm")
I tried doing this with my original dataset and then imputing from that, but because the new variable is a function of the others it has resulted in nonconvergence of my models and wildly inaccurate parameter estimates on other models I've created.
First create the full data sets and then add the column:
all_sets <- lapply(1:16, function(x) complete(imp, x))
final <- lapply(all_sets, function(x) cbind(x, var_new=x$chl - x$bmi))
Now final
is a list containing all 16 data sets, final[[1]] to final[[16]], for example:
str(final[[1]])
# 'data.frame': 25 obs. of 5 variables:
# $ age : num 1 2 1 3 1 3 1 1 2 2 ...
# $ bmi : num 28.7 22.7 22 22.7 20.4 25.5 22.5 30.1 22 26.3 ...
# $ hyp : num 1 1 1 1 1 1 1 1 1 2 ...
# $ chl : num 187 187 187 218 113 184 118 187 238 206 ...
# $ var_new: num 158.3 164.3 165 195.3 92.6 ...