Search code examples
rloopslogistic-regressionimputation

Creating a loop for multiple dependent variables in logistic regression using a multiple imputation dataset?


This previous question "How to repeatedly perform glm over multiple dependent variables after mice?" - did not work for me. I don't understand how it incorporates the pooling of the mids.

######################

I need to repeat logistic regression for 7 dependent variables, using the same 5 predictor variables for each regression.

I need to do multiple imputation using mice before I perform the regressions.

I then need an output of exponentiated odds ratios and confidence intervals for each of the dependent variables.

Here is a smaller example:

 P1 <- c(7,8,5,NA)
 P2 <- c(10,12,11,12)
 P3 <- c(1,1,0,1)
 P4 <- c(1,1,1,0)

 R1 <- c(0,1,0,1)
 R2 <- c(0,0,1,0)
 R3 <- c(1,1,0,0)
 R4 <- c(0,1,1,1)
 R5 <- c(1,0,1,0)

df1 <- data.frame(P1,P2,P3,P4,R1,R2,R3,R4,R5)
cols <- c("P3","P4","R1","R2","R3","R4","R5")

 df1[cols] <- lapply(df1[cols], factor)

Multiple Imputation:

library(mice)
imp <- mice(df1, maxit=0)

predM <- imp$predictorMatrix

meth <- imp$method

imp2 <- mice(df1, maxit=5, 
         predictorMatrix = predM,
         method = meth, print = F)

Logistic Regression after pooling imputed data:

m1.mi <- with(imp2, glm(R1 ~ P1 + P2 + P3 + P4, family=binomial(link=logit)))

summary(pool(m1.mi),exponentiate=TRUE, conf.int = TRUE)

Is there a way to create a loop, that will produce an odds ratio and CIs for each dependent variable?

Any help appreciated, I apologise for my very limited knowledge in this area.


Solution

  • reformulate is neat for jobs like this. Define the process of one iteration in an anonymous function (those with function(x) or abbreviated \(x)) and put it into lapply. The summary can be subsetted to desired columns just like a data frame using the respective column names in brackets.

    Let's use the nhanes data coming with mice in an example.

    dvs <- c('age1', 'age2', 'age3')
    
    res <- lapply(dvs, \(x) {
      mi <- with(imp, glm(reformulate(c('bmi', 'hyp', 'chl'), x), family=binomial))
      s <- summary(pool(mi), exponentiate=TRUE, conf.int=TRUE)
      `rownames<-`(s[c('estimate', '2.5 %', '97.5 %')], s$term) |> as.matrix()
    }) |> setNames(dvs)
    

    This gives a named list like this one,

    res
    # $age1
    #                 estimate     2.5 %   97.5 %
    # (Intercept) 2.197201e+04 0.0000000      Inf
    # bmi         1.999346e+00 0.4691464 8.520549
    # hyp         4.331347e-08 0.0000000      Inf
    # chl         9.441578e-01 0.8403936 1.060734
    # 
    # $age2
    #              estimate        2.5 %      97.5 %
    # (Intercept) 0.9300514 0.0002224722 3888.105740
    # bmi         0.8891490 0.6550513927    1.206907
    # hyp         2.8818840 0.2054892623   40.416981
    # chl         1.0046189 0.9757483655    1.034344
    # 
    # $age3
    #              estimate        2.5 %       97.5 %
    # (Intercept) 3.5383401 1.087542e-07 1.151206e+08
    # bmi         0.6442681 2.792523e-01 1.486403e+00
    # hyp         5.9651279 5.954159e-02 5.976117e+02
    # chl         1.0323994 9.885890e-01 1.078151e+00
    

    where you can output individual elements in this way:

    res$age1
    #                 estimate     2.5 %   97.5 %
    # (Intercept) 2.197201e+04 0.0000000      Inf
    # bmi         1.999346e+00 0.4691464 8.520549
    # hyp         4.331347e-08 0.0000000      Inf
    # chl         9.441578e-01 0.8403936 1.060734
    

    Where

    class(res$age1)
    # [1] "matrix" "array" 
    

    If you want class "data.frame" in the list elements, just leave out the |> as.matrix() part.


    Data:

    data("nhanes", package='mice')
    nhanes <- transform(nhanes, age1=age == 1, age2=age == 2, age3=age == 3)
    library(mice)
    imp0 <- mice(nhanes, maxit=0)
    imp <- mice(nhanes, maxit=5, 
                predictorMatrix=imp0$predictorMatrix,
                method=imp0$method, print=F)