Search code examples
rr-mice

Adding new variable to an imputed dataset based on imputed values


I wish to create a variable which is computed from values in two other variables from an imputed dataset and I was wondering if there's a way to achieve this?

e.g. if I wanted to create a new variable var_new to the nhanes dataset which I've run 16 m estimates on (in the mice package below), which was equal to the value of chl - bmi, is there a way to achieve this?

library(mice)
aux_vart <- mice::quickpred(
nhanes,
mincor = 0.1
)
imp <- mice::mice(nhanes, pred = aux_vart, m = 16, meth = "pmm")

I tried doing this with my original dataset and then imputing from that, but because the new variable is a function of the others it has resulted in nonconvergence of my models and wildly inaccurate parameter estimates on other models I've created.


Solution

  • First create the full data sets and then add the column:

    all_sets <- lapply(1:16, function(x) complete(imp, x))
    final <- lapply(all_sets, function(x) cbind(x, var_new=x$chl - x$bmi))
    

    Now final is a list containing all 16 data sets, final[[1]] to final[[16]], for example:

    str(final[[1]])
    # 'data.frame': 25 obs. of  5 variables:
    #  $ age    : num  1 2 1 3 1 3 1 1 2 2 ...
    #  $ bmi    : num  28.7 22.7 22 22.7 20.4 25.5 22.5 30.1 22 26.3 ...
    #  $ hyp    : num  1 1 1 1 1 1 1 1 1 2 ...
    #  $ chl    : num  187 187 187 218 113 184 118 187 238 206 ...
    #  $ var_new: num  158.3 164.3 165 195.3 92.6 ...