Search code examples
rr-mice

How to divide imputed data with 'survSplit' function and finally pool the analysis results using 'pool' function in MICE package


When I do cox regression using R, I found the proporitonal hazard assumption is violated. So I decided to divide the data into different time intervals using the 'survSplit' function and calculate the HR for each time interval.

For example:

1) divide data into different time intervals

`D1 <- survSplit(Surv(time, status) ~ ., data= data1, cut=c(40, 60),
episode= "tgroup", id="id")`

2) to get HR for each time interval

`coxph(Surv(tstart, time, status) ~ age + class:strata(tgroup), data=D1)`

Now I want to do the same for multiply-imputed datasets (n=5).

step 1 data imputation

imp <- mice(data1, seed = 12345, n=5,print = FALSE)

step 2 reanalysis

fit <- with(imp, coxph(Surv(tstart, time, status) ~ age + class:strata(tgroup))

step 3 pool the results

`summary(pool(fit))`

My question is how I can post process the imputed data to divide each imputed dataset into different time intervals using 'survSplit' between step 1 and step 2. After post-processing, I can continue to do step 2 and step3.

Although I can extract each imputed dataset and then divide it using 'survSplit', I am not able to continue step2 and step 3 using the MICE package. Kindly help. Thanks.


Solution

  • In this case the best approach is to use {} inside the with() command, which will allow you to place multiple statements inside. Here's a demonstration, based on the example in the survSplit() documentation.

    library(survival)
    library(mice)
    set.seed(123)
    
    # use the veteran dataset with some values made missing
    veteran2 <- veteran
    veteran2[which(rbinom(nrow(veteran2),1,0.1) == 1),
             "karno"] <- NA
    # make imputations
    imps <- mice(veteran2, printFlag = FALSE)
    # fit the models, with survSplit used inside each call
    mods <- with(imps, 
                 {
                   dat <- data.frame(mget(ls()))
                   dat_split <- survSplit(Surv(time, status) ~ ., 
                                          data = dat,
                                          cut = c(60, 120), 
                                          episode = "timegroup")
                   coxph(Surv(tstart, time, status) ~ karno*strata(timegroup), 
                         data = dat_split)
                 })
    pooled <- pool(mods)
    summary(pooled)
    #>                                 term    estimate   std.error statistic
    #> 1                              karno -0.04986785 0.007715564 -6.463279
    #> 2 karno:strata(timegroup)timegroup=2  0.03645985 0.015247181  2.391252
    #> 3 karno:strata(timegroup)timegroup=3  0.04104428 0.013389159  3.065486
    #>          df      p.value
    #> 1  70.51041 1.141572e-08
    #> 2  94.68223 1.876833e-02
    #> 3 108.83144 2.740870e-03
    
    c(t0_60 = pooled$pooled$estimate[1],
      t60_120 = sum(pooled$pooled$estimate[c(1,2)]),
      t_120 = sum(pooled$pooled$estimate[c(1,3)]))
    #>        t0_60      t60_120        t_120 
    #> -0.049867847 -0.013407999 -0.008823565
    

    Created on 2022-08-24 by the reprex package (v2.0.1)