According to the documentation of the mice
package, if we want to impute data when we're interested in interaction terms we need to use passive imputation. Which is done the following way.
library(mice)
nhanes2.ext <- cbind(nhanes2, bmi.chl = NA)
ini <- mice(nhanes2.ext, max = 0, print = FALSE)
meth <- ini$meth
meth["bmi.chl"] <- "~I((bmi-25)*(chl-200))"
pred <- ini$pred
pred[c("bmi", "chl"), "bmi.chl"] <- 0
imp <- mice(nhanes2.ext, meth = meth, pred = pred, seed = 51600, print = FALSE)
It is said that
Imputations created in this way preserve the interaction of bmi with chl
Here, a new variable called bmi.chl
is created in the original dataset. The meth
step tells how this variable needs to be imputed from the existing ones. The pred
step says we don't want to predict bmi
and chl
from bmi.chl
. But now, if we want to apply a model, how do we proceed? Is the product defined by "~I((bmi-25)*(chl-200))"
is just a way to control for the imputed values of the main effects, i.e. bmi
and chl
?
If the model we want to fit is glm(hyp~chl*bmi, family="binomial")
, what is the correct way to specify this model from the imputed data? fit1
or fit2
?
fit1 <- with(data=imp, glm(hyp~chl*bmi, family="binomial"))
summary(pool(fit1))
Or do we have to use somehow the imputed values of the new variable created, i.e. bmi.chl
?
fit2 <- with(data=imp, glm(hyp~chl+bmi+bmi.chl, family="binomial"))
summary(pool(fit2))
With passive imputation, it does not matter if you use the passively imputed variable, or if you re-calculate the product term in your call to glm
.
The reason that fit1
and fit2
yield different results in your example is because are not just doing passive imputation for the product term.
Instead you are transforming the two variables befor multiplying (i.e., you calculate bmi-25
and chl-100
). As a result, the passively imputed variable bmi.chl
does not represent the product term bmi*chl
but rather (bmi-25)*(chl-200)
.
If you just calculate the product term, then fit1
and fit2
yield the same results like they should:
library(mice)
nhanes2.ext <- cbind(nhanes2, bmi.chl = NA)
ini <- mice(nhanes2.ext, max = 0, print = FALSE)
meth <- ini$meth
meth["bmi.chl"] <- "~I(bmi*chl)"
pred <- ini$pred
pred[c("bmi", "chl"), "bmi.chl"] <- 0
pred[c("hyp"), "bmi.chl"] <- 1
imp <- mice(nhanes2.ext, meth = meth, pred = pred, seed = 51600, print = FALSE)
fit1 <- with(data=imp, glm(hyp~chl*bmi, family="binomial"))
summary(pool(fit1))
# > round(summary(pool(fit1)),2)
# est se t df Pr(>|t|) lo 95 hi 95 nmis fmi lambda
# (Intercept) -23.94 38.03 -0.63 10.23 0.54 -108.43 60.54 NA 0.41 0.30
# chl 0.10 0.18 0.58 9.71 0.58 -0.30 0.51 10 0.43 0.32
# bmi 0.70 1.41 0.49 10.25 0.63 -2.44 3.83 9 0.41 0.30
# chl:bmi 0.00 0.01 -0.47 9.67 0.65 -0.02 0.01 NA 0.43 0.33
fit2 <- with(data=imp, glm(hyp~chl+bmi+bmi.chl, family="binomial"))
summary(pool(fit2))
# > round(summary(pool(fit2)),2)
# est se t df Pr(>|t|) lo 95 hi 95 nmis fmi lambda
# (Intercept) -23.94 38.03 -0.63 10.23 0.54 -108.43 60.54 NA 0.41 0.30
# chl 0.10 0.18 0.58 9.71 0.58 -0.30 0.51 10 0.43 0.32
# bmi 0.70 1.41 0.49 10.25 0.63 -2.44 3.83 9 0.41 0.30
# bmi.chl 0.00 0.01 -0.47 9.67 0.65 -0.02 0.01 25 0.43 0.33
This is not surprising because the ~I(bmi*chl)
in mice
and the bmi*chl
in glm
do the exact same thing. They merely calculate the product of the two variables.
Remark:
Note that I added a line saying that bmi.chl
should be used as a predictor when imputing hyp
. Without this step, passive imputation has no purpose because the imputation model would neglect the product term, thus being incongruent with the analysis model.