Search code examples
rsurvival-analysis

cumulative incidence plot for two groups with no competing risk


I am hoping someone can provide some help with an analysis I have.

Context: I have a dataset with patients who had liver transplant (some of them are in treatment group A, others in treatment group B, depending on the immunosuppressive med they’re on). Due to the transplant, these patients are at high risk for developing donor derived HBV infections.

What’s needed: The investigator is interested in time until infection starts (first instance of HBV) and the proportion that develop the infection over time. They also want the cumulative incidence of HBV infection at baseline, and at each of the post-transplant follow-up timepoints (6 months, 12 months, 18 months and 24 months) for group A and group B. For example, the 6-months data would be the proportion of those patients with 6-months follow-up who ever had HBV, the 12-months data would the proportion of those with 12-months data follow-up who ever had HBV and so on.

Cumulative incidence in this specific case just means 1 minus the survival function, without accounting for any competing risks. The analysis population has no deaths or loss to follow up.

My questions are:

  1. How do I get the cumulative incidence by treatment group over time? (I want to also display the numbers at risk and number of events for each group under the plot)
  2. Is it possible to also display a log rank p-value on the cumulative incidence plot, to see if there’s any significant difference between the groups in terms of time to first infection or does that not make any sense?
  3. Any way to also obtain the cumulative incidence numbers at each time point, for each group with the std.err and 95%CI, similar to the life tables we get when we use summary(km) below? these life tables give me the survival probabilities so I guess if I want cumulative incidence, I could just manually do 1-survival probability but not sure how to obtain the std.err and confidence intervals?

Below is a test dataset similar to the actual one, and what I have done so far:


time<-c(1.5989,6.9433, 0.8890, 3.2691, 1.0514, 2.7625, 1.4319, 0.9681, 7.4416, 0.0268, 1.5168, 1.9647, 0.0657, 4.3571, 6.4490, 0.2198, 1.2028, 0.9555, 0.2601, 2.0096, 7.5156, 0.4463, 0.2355, 0.9391, 2.6996)

censor<-c(1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)

group<-c(1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1)

df<-data.frame(ID, time, censor, group)
View(df)

km<-survfit(formula = Surv(time, censor) ~ group, data = df)
summary(km)

#cumulative incidence plot
plot(km, fun = function(x) 1-x)

#log rank test;
survdiff(Surv(time, censor) ~ group, data=df)
#plot survival curves for each treatment group
plot(survfit(Surv(time, censor) ~ group, data = df), 
     xlab = "Time", 
     ylab = "Overall survival probability")

Solution

  • It is better to use survminer package to draw survival plots by its ggsurvplot() function.

    fun argument of ggsurvplot() can be "event" for cumulative incidence or "cumhaz" for cumulative hazard function, or if left by default, would plot Kaplan Meier curve.

    Additional arguments such as pvalue also can be customized.
    pvalue method is kaplan meier by default.

    You can see more examples in survminer documentations.

    library(survminer)
    #> Loading required package: ggplot2
    #> Loading required package: ggpubr
    
    time <- c(1.5989,6.9433, 0.8890, 3.2691, 1.0514, 2.7625, 1.4319, 0.9681, 7.4416, 
              0.0268, 1.5168, 1.9647, 0.0657, 4.3571, 6.4490, 0.2198, 1.2028, 0.9555, 
              0.2601, 2.0096, 7.5156, 0.4463, 0.2355, 0.9391, 2.6996)
    
    censor <- c(1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
    
    group <- c(1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1)
    
    df <- data.frame(time, censor, group)
    
    km <- survfit(formula = Surv(time, censor) ~ group, data = df)
    
    
    #cumulative incidence plot 
    ggsurvplot(km,
               data = df,
               censor = T,
               risk.table = TRUE,
               legend.labs = c("group 1", "group 2"),
               xlim = c(0,10),
               ylim = c(0,1),
               pval = T,
               pval.method = T,
               pval.method.coord = c(2.5,0.5),
               pval.coord = c(4.2,0.5),
               xlab = "Months",
               ylab = "SURVIVAL PROBABILITY",
               linetype = c(1,2),
               legend.title = "",
               palette = c('red', 'blue'),
               fun="event"
    ) 
    
    

    enter image description here

    Created on 2023-02-12 with reprex v2.0.2