Search code examples
rgtsummaryr-micecox-regressionr-forestplot

How to convert a mimira object (Cox regression model, from multiple imputations and a propensity score matching (MatchThem pkg)) into a Forest plot


Dear StackOverflow community, as a surgeon, and full of enthusiasm for 6 months for R learning in self-taught mode (StackOverflow, and so many websites), I beg your indulgence in the triviality of my concern.

The background: Briefly, my objective is to run a survival cox model regression for a dataset of cancer patients. Due to the retrospective aspect, I planned to make a matching 1:3 with propensity score matching (PSM). The missing data were dealt with multiple imputations ("mice" pkg). The PSM was managed with "MatchThem" pkg. I used "survey" pkg for pooling the survival (svycoxph() pooled through with() function). This leads us to a mimira object, which I can easily print out into a beautiful Table, with tbl_regression ("gtsummary" pkg).

The issue: As a usually print my cox regressions into a Hazard ratios Table and a graphical version (Forest plot with ggforest(), from "survminer" pkg), this time I am really stuck. The function ggforest doesn't recognize the mimira object as a "coxph object" and send this error :

Error in ggforest(tbl_regression_object, data = mimira_object) : 
  inherits(model, "coxph") is not TRUE 

I guess that adding a PSM to my multiple imputations is the problem, as I had no problem for printing cox regression of multiple imputations with Forest plot (ggforest is able to deal mira objects without problem with pool_and_tidy_mice() function).

Here is the script:

#Data
library(fabricatr)
library(simsurv)

# Simulate patient data in a clinical trial
participant_data <- fabricate(
  N = 2000,
  age = runif(N, min = 18, max = 85),
  is_female = draw_binary(prob = 0.5, N = N),
  is_smoker = draw_binary(prob = 0.2 + 0.2 * (age > 50), N = N),
  disease_stage = round(runif(N, min = 1 + 0.5 * (age > 65), max = 4)),
  treatment = draw_binary(prob = 0.5, N = N),
  kps = runif(N, min = 40, max = 100)
)

# Simulate data in the survival context
survival_data <- simsurv(
  lambdas = 0.1, gammas = 1.8,
  x = participant_data, 
  betas = c(is_female = -0.2, is_smoker = 1.2,
            treatment = -0.4, kps = -0.005,
            disease_stage = 0.2),
  maxt = 5)

# Merging df
library(dplyr)
mydata_complete <- bind_cols(survival_data, participant_data)

# generating missing value
library(missMethods)
mydata_uncomp <- delete_MCAR(mydata_complete, 0.3)
mydata <- mydata_uncomp

#1 imputation with "mice"
library(mice)
mydata$nelsonaalen <- nelsonaalen(mydata, eventtime, status)
mydata_mice_imp_m3 <- mice(mydata, maxit = 2, m = 3, seed = 20200801) # m=3 is for testing


#2 matching (PSM 1:3) with "MatchThem"
library(MatchThem)
mydata_imp_m3_psm <- matchthem(treatment ~ age + is_female + disease_stage, data = mydata_mice_imp_m3, approach = "within" ,ratio= 1, method = "optimal")

#3 Pooling Coxph models in multiple imputed datasets and PSM with "survey"
library(survey)
mimira_object <- with(data = mydata_imp_m3_psm, expr = svycoxph(Surv(eventtime, status) ~ age+ is_smoker + disease_stage))
pool_and_tidy_mice(mimira_object, exponentiate = TRUE, conf.int=TRUE) -> pooled_imp_m3_cph

    # estimates with pool_and_tidy_mice() works with mimira_object but cannot bring me de degree of freedoms. Warning message :
In get.dfcom(object, dfcom) : Infinite sample size assumed.
> pooled_imp_m3_cph
           term  estimate   std.error  statistic p.value conf.low conf.high            b  df dfcom fmi    lambda m      riv         ubar
1           age 0.9995807 0.001961343 -0.2138208     NaN      NaN       NaN 1.489769e-06 NaN   Inf NaN 0.5163574 3 1.067643 1.860509e-06
2     is_smoker 2.8626952 0.093476026 11.2516931     NaN      NaN       NaN 4.182884e-03 NaN   Inf NaN 0.6382842 3 1.764601 3.160589e-03
3 disease_stage 1.2386947 0.044092483  4.8547535     NaN      NaN       NaN 8.995628e-04 NaN   Inf NaN 0.6169374 3 1.610540 7.447299e-04

#4 Table summary of the pooled results
library(gtsummary)
tbl_regression_object <- tbl_regression(mimira_object, exp=TRUE, conf.int = TRUE) # 95% CI and p-value are missing due to an issue with an other issue in the pooling of the mimira_object. The Matchthem:::get.2dfcom function gives a dfcom = 999999 (another issue to be solved in my concern)


#5 What it should looks like as graphical summary
library(survival)
mydata.cox <- coxph(Surv(eventtime, status) ~ age+ is_smoker + disease_stage, mydata_uncomp) # (df mydata_uncomp is without imputation and PSM)

    #with gtsummary
forestGT <- 
  mydata.cox %>%
  tbl_regression(exponentiate = TRUE,
                 add_estimate_to_reference_rows = TRUE) %>% 
  plot()
(forestGT) # See picture GT_plot1. Almost perfect. Would have been great to know how to add N, 95% CI, HR, p-value and parameters of the model (AIC, events, concordance, etc.)

    #with survminer
HRforest <- 
  survminer::ggforest(mydata.cox, data = mydata_uncomp)
(HRforest) # See picture Ggforest. Everything I need to know about my cox regression is all in there. For me it is just a great regression cox forest plot.

#6 Actually what happens when I do the same thing with imputed and matched df

    #with gtsummary
forestGT_imp_psm <- 
  mimira_object %>%
  tbl_regression(exponentiate = TRUE,
                 add_estimate_to_reference_rows = TRUE) %>% 
  plot() # WARNING message : In get.dfcom(object, dfcom) : Infinite sample size assumed.
(forestGT_imp_psm) # See picture GT_plot2. The plot is rendered but without 95% IC

    #with survminer
HRforest_imp_psm <- 
  ggforest(mimira_object, data = mydata_imp_m3_psm) # ERROR:in ggforest(mimira_object, data = mydata_imp_m3_psm) : inherits(model, "coxph") is not TRUE
(HRforest_imp_psm)

#7 The lucky and providential step
# your solution/advise

Would greatly appreciate your help.

cheers.

AK

Picture GT_plot1 (not allowed to embed images in this post, here is sharelink : GT_plot1

Picture Ggforest_plot Ggforest_plot

Picture GT_plot2 GT_plot2


Solution

  • It seems that there are two distinct problems here:

    Problem #1. getting gtsummary() to produce a table with p values and confidence intervals of the pooled, matched data

    Problem #2. producing a ggforest() to produce a plot of the pooled estimates.

    Problem #1:

    Let us follow the instructions in the paper "MatchThem:: Matching and Weighting after Multiple Imputation" (https://arxiv.org/ftp/arxiv/papers/2009/2009.11772.pdf) [page 15]

    and modify your block #3. Instead of calling pool_and_tidy_mice() we do the following:

    matched.results <- pool(mimira_object)
    summary(matched.results, conf.int = TRUE)
    

    This produces the following:

               term      estimate   std.error  statistic        df      p.value        2.5 %     97.5 %
    1           age -0.0005997864 0.001448251 -0.4141453 55.266353 6.803707e-01 -0.003501832 0.00230226
    2     is_smoker  1.1157796620 0.077943244 14.3152839  9.961064 5.713387e-08  0.942019234 1.28954009
    3 disease_stage  0.2360965310 0.051799813  4.5578645  3.879879 1.111782e-02  0.090504018 0.38168904
    

    This means that performing the imputation with mice and then matching with MatchThem works, since you do get the p values and the confidence intervals.

    Compare to the output from pool_and_tidy_mice():

               term      estimate   std.error  statistic p.value            b  df dfcom fmi    lambda m
    1           age -0.0005997864 0.001448251 -0.4141453     NaN 2.992395e-07 NaN   Inf NaN 0.1902260 3
    2     is_smoker  1.1157796620 0.077943244 14.3152839     NaN 2.041627e-03 NaN   Inf NaN 0.4480827 3
    3 disease_stage  0.2360965310 0.051799813  4.5578645     NaN 1.444843e-03 NaN   Inf NaN 0.7179644 3
            riv         ubar
    1 0.2349124 1.698446e-06
    2 0.8118657 3.352980e-03
    3 2.5456522 7.567636e-04
    

    Where everything is the same except for df and p.value which were not calculated in the latter table.

    I therefore think this is an issue with the pool_and_tidy_mice() and you should post this as an issue on GitHub at gtsummary.

    For right now, you can bypass this problem by changing svycoxph() to survival::coxph() in block #3 when you call the with() function. If you do that, then eventually you will get a gtsummary table with p.values and confidence intervals. Ultimately, the problem is probably some interaction between svycoxph() and pool_and_mice(), hence why I believe that you should post this on GitHub.

    Problem #2:

    The short answer is that there cannot be a ggforest plot with all the data that you are looking for.

    https://www.rdocumentation.org/packages/mice/versions/3.13.0/topics/pool reads:

    A common error is to reverse steps 2 and 3, i.e., to pool the multiply-imputed data instead of the estimates. Doing so may severely bias the estimates of scientific interest and yield incorrect statistical intervals and p-values. The pool() function will detect this case.

    This means that there is no "real" dataset for the pooled estimates (i.e. you cannot really combine the datasets for imputations 1-3), which means that ggforest() cannot compute the desired plot (since it needs to have a dataset and that cannot be used because it would lead to erroneous estimates).

    What you could do, is present all the ggforest plots for each imputation (so if you did 3 imputations, you will get 3 slightly different ggforest plots) and finally add the pooled estimates plot by using plot() as suggested above.

    To create each ggforest plot you need the following line of code:

    ggforest(mimira_object$analyses[[1]], complete(mydata_imp_m3_psm, 1))
    

    This will create the ggforest plot for your first imputation. Change the numbers to 2 and 3 to check the remaining imputations.

    I hope this helped,

    Alex