Search code examples
rimputationr-mice

Is there any way to use mice package and the gWQS package together?


I am trying to impute the missing values of C1-C3 variables of a large dataset using mice package. That has worked so far. The problem arises when I am trying to use the gWQS package to conduct mixtures effect X1-X4 chemicals.

I have tried imputing the missing values of my covariates using mice package then I have tried using the imputed dataframe in the gWQS package to conduct WQS regression. However, my code is not accepted as imp$imp is a list. I have also tried the miWQS package however that package has limitations with imputation methods that I do not want to use.

Original dataset comprises of Y as continuous outcome X1-X4 as continuous measures of exposure and C1-C3 as covariates that were imputed with mice.

Imputation model using mice

imp <- mice::mice(originaldf,m=2, meth=meth, pred=pred, 
                  seed=51162,visitSequence="monotone", pri=FALSE)

toxic_chems=c("X1" , "X2",  "X3", "X4")
set.seed(2019)

library("gWQS")
gwqs(Y ~ C1 C2 C3, mix_name=toxic_chems, data=imp$imp,
     q=4, validation=0.8, valid_var=NULL, b=10, b1_pos=F, b1_constr=F, 
     family="gaussian", seed=2019, wqs2=T, plots=T, tables=T)

Error:

Error in .check.function(formula, mix_name, data, q, validation, valid_var,  : 
                            data must be a data.frame

Solution

  • As you've already noticed, mice() yields a list, namely a list of all your variables with their imputations, in your case two imputations, since you've chosen m=2. That's how multiple imputation works. Here an example with nhanes data included into mice:

    imp <- mice::mice(nhanes, m=2)
    imp$imp
    # $age
    # [1] 1 2
    # <0 rows> (or 0-length row.names)
    # 
    # $bmi
    #       1    2
    # 1  30.1 25.5
    # 3  27.2 28.7
    # 4  20.4 24.9
    # [...]
    # 
    # $hyp
    #    1 2
    # 1  1 1
    # 4  1 2
    # 6  1 2
    # [...]
    # 
    # $chl
    #      1   2
    # 1  187 187
    # 4  131 186
    # 10 229 187
    # [...]
    

    If you'd use OLS, the standard way is to fit a model over this list, and pool the results. mice then is using the lm.mids method included in the package.

    fit <- with(data=imp, exp=lm(bmi ~ age + hyp + chl))
    pool(fit)
    pool(fit)$pooled[, 1:5]  # shortened
    #                estimate         ubar            b            t dfcom
    # (Intercept) 20.28615169 1.354978e+01 6.556134e+00 2.338398e+01    21
    # age         -3.01670128 1.081655e+00 1.238383e-03 1.083512e+00    21
    # hyp          1.89935232 4.074904e+00 2.092851e+00 7.214181e+00    21
    # chl          0.04517373 3.813968e-04 5.113178e-06 3.890666e-04    21
    

    And this is the point where you run into a problem, because there exists no gwqs.mids method (but there is a glm.mids method), and you probably need to write it yourself, or ask one of the package authors.

    However, there is a complete() function included in mice, which yields a "data.frame", with which you also could do pooled calculations. It should be used with care, though, i.e. using everything else than the "long" format (i.e. just one single imputation) would be very wrong.

    complete(imp, "long")
    #    .imp .id age  bmi hyp chl
    # 1     1   1   1 30.1   1 187
    # 2     1   2   2 22.7   1 187
    # 3     1   3   1 27.2   1 187
    # [...]
    # 26    2   1   1 25.5   1 187
    # 27    2   2   2 22.7   1 187
    # 28    2   3   1 28.7   1 187
    # [...]
    class(complete(imp, "long"))
    # [1] "data.frame"
    

    The ".imp" variable now indicates the number of the imputation, and you could calculate your gwqs model for each subset of ".imp" indicators.

    To pool the results now, you'd have to consider between and within variances (see Rubin 1987:76).

    To elaborate further on this, though, would go too far for Stack Overflow. If you don't know how to do this, you'd need to consult a statistician, or ask at Cross Validated how to do that.

    At least this would be a way to use mice and gWQS together.