r statistics missing-data interaction r-mice

Model multiple imputation with interaction terms

According to the documentation of the mice package, if we want to impute data when we're interested in interaction terms we need to use passive imputation. Which is done the following way.

library(mice)
nhanes2.ext <- cbind(nhanes2, bmi.chl = NA)
ini <- mice(nhanes2.ext, max = 0, print = FALSE)

meth <- ini$meth
meth["bmi.chl"] <- "~I((bmi-25)*(chl-200))"

pred <- ini$pred
pred[c("bmi", "chl"), "bmi.chl"] <- 0

imp <- mice(nhanes2.ext, meth = meth, pred = pred, seed = 51600, print = FALSE)

It is said that

Imputations created in this way preserve the interaction of bmi with chl

Here, a new variable called bmi.chl is created in the original dataset. The meth step tells how this variable needs to be imputed from the existing ones. The pred step says we don't want to predict bmi and chl from bmi.chl. But now, if we want to apply a model, how do we proceed? Is the product defined by "~I((bmi-25)*(chl-200))" is just a way to control for the imputed values of the main effects, i.e. bmi and chl?

If the model we want to fit is glm(hyp~chl*bmi, family="binomial"), what is the correct way to specify this model from the imputed data? fit1 or fit2?

fit1 <- with(data=imp, glm(hyp~chl*bmi, family="binomial"))
summary(pool(fit1))

Or do we have to use somehow the imputed values of the new variable created, i.e. bmi.chl?

fit2 <- with(data=imp, glm(hyp~chl+bmi+bmi.chl, family="binomial"))
summary(pool(fit2))

Solution

With passive imputation, it does not matter if you use the passively imputed variable, or if you re-calculate the product term in your call to glm. The reason that fit1 and fit2 yield different results in your example is because are not just doing passive imputation for the product term.

Instead you are transforming the two variables befor multiplying (i.e., you calculate bmi-25 and chl-100). As a result, the passively imputed variable bmi.chl does not represent the product term bmi*chl but rather (bmi-25)*(chl-200).

If you just calculate the product term, then fit1 and fit2 yield the same results like they should:

library(mice)
nhanes2.ext <- cbind(nhanes2, bmi.chl = NA)
ini <- mice(nhanes2.ext, max = 0, print = FALSE)

meth <- ini$meth
meth["bmi.chl"] <- "~I(bmi*chl)"

pred <- ini$pred
pred[c("bmi", "chl"), "bmi.chl"] <- 0
pred[c("hyp"), "bmi.chl"] <- 1

imp <- mice(nhanes2.ext, meth = meth, pred = pred, seed = 51600, print = FALSE)

fit1 <- with(data=imp, glm(hyp~chl*bmi, family="binomial"))
summary(pool(fit1))
# > round(summary(pool(fit1)),2)
#                est    se     t    df Pr(>|t|)   lo 95 hi 95 nmis  fmi lambda
# (Intercept) -23.94 38.03 -0.63 10.23     0.54 -108.43 60.54   NA 0.41   0.30
# chl           0.10  0.18  0.58  9.71     0.58   -0.30  0.51   10 0.43   0.32
# bmi           0.70  1.41  0.49 10.25     0.63   -2.44  3.83    9 0.41   0.30
# chl:bmi       0.00  0.01 -0.47  9.67     0.65   -0.02  0.01   NA 0.43   0.33

fit2 <- with(data=imp, glm(hyp~chl+bmi+bmi.chl, family="binomial"))
summary(pool(fit2))
# > round(summary(pool(fit2)),2)
#                est    se     t    df Pr(>|t|)   lo 95 hi 95 nmis  fmi lambda
# (Intercept) -23.94 38.03 -0.63 10.23     0.54 -108.43 60.54   NA 0.41   0.30
# chl           0.10  0.18  0.58  9.71     0.58   -0.30  0.51   10 0.43   0.32
# bmi           0.70  1.41  0.49 10.25     0.63   -2.44  3.83    9 0.41   0.30
# bmi.chl       0.00  0.01 -0.47  9.67     0.65   -0.02  0.01   25 0.43   0.33

This is not surprising because the ~I(bmi*chl) in mice and the bmi*chl in glm do the exact same thing. They merely calculate the product of the two variables.

Remark:

Note that I added a line saying that bmi.chl should be used as a predictor when imputing hyp. Without this step, passive imputation has no purpose because the imputation model would neglect the product term, thus being incongruent with the analysis model.