Search code examples
rplotdurationsurvival-analysis

R: Plot Survival curve with types of death as stacked area chart


I have a dataset which is right censored containing information of life times and different types of deaths for a given sample and I want to produce a plot of a survival curve (with the actual values that would be calculated from the sample and not from a model estimation) with the different types of death as a stacked area chart, something like this: desired plot

How can I accomplish this in R?

The dataset would look something like this:

death type time event
1       Type3   81     1
2          NA  868     0
3       Type3 1022     1
4          NA  868     0
5          NA  868     0
6          NA  868     0
7          NA  868     0
8          NA  887     0
9       Type3  156     1
10         NA  868     0
11         NA  868     0
12         NA  868     0
13      Type3  354     1
14      Type3  700     1
15      Type3  632     1
16         NA  868     0
17      Type1  308     1
18         NA 1001     0
19         NA 1054     0
20         NA 1059     0
21      Type3  120     1
22         NA  732     0
23      Type3  543     1
24      Type1  379     1
25         NA  613     0
26         NA 1082     0
27      Type3  226     1
28      Type2    1     0
29         NA  976     0
30         NA 1000     0
31         NA  706     0
32         NA 1015     0
33      Type3  882     1
34         NA 1088     0
35         NA  642     0
36      Type3  953     1
37         NA 1068     0
38         NA  819     0
39         NA 1029     0
40      Type3   34     1
41         NA 1082     0
42      Type3  498     1
43         NA  923     0
44         NA 1041     0
45      Type3  321     1
46         NA  557     0
47         NA  628     0
48      Type3  197     1
49      Type3  155     1
50         NA  955     0

Where death type with NA indicates censored data, time is the time of death or time of censoring, and event is 1 for those who are dead and 0 for those who are censored. (This is the format required by 'survfit' but I also have it with actual start and end times as dates)

(Now, with only 50 points it wouldn't be possible to construct such a curve, but the data has a lot more rows that wouldn't fit here).


Solution

  • It's an ugly bit of code, but it gets the idea in. I didn't take the time to figure out how to add the legend. Please also note that this kind of figure, while interesting in concept, isn't necessarily going to mirror a KM curve. To be honest, if you're going to present the data this way, it makes more sense to do it as stacked bars at fixed time points.

    Please note, I'm pretty sure there are some flaws lurking in this code. It comes with no warranty, but might get you started.

    SurvData <- structure(list(row.names = c("", "", "", "", "", "", "", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
    "", "", "", "", "", "", "", "", "", ""), death = 1:50, type = c("Type3", 
    NA, "Type3", NA, NA, NA, NA, NA, "Type3", NA, NA, NA, "Type3", 
    "Type3", "Type3", NA, "Type1", NA, NA, NA, "Type3", NA, "Type3", 
    "Type1", NA, NA, "Type3", "Type2", NA, NA, NA, NA, "Type3", NA, 
    NA, "Type3", NA, NA, NA, "Type3", NA, "Type3", NA, NA, "Type3", 
    NA, NA, "Type3", "Type3", NA), time = c(81L, 868L, 1022L, 868L, 
    868L, 868L, 868L, 887L, 156L, 868L, 868L, 868L, 354L, 700L, 632L, 
    868L, 308L, 1001L, 1054L, 1059L, 120L, 732L, 543L, 379L, 613L, 
    1082L, 226L, 1L, 976L, 1000L, 706L, 1015L, 882L, 1088L, 642L, 
    953L, 1068L, 819L, 1029L, 34L, 1082L, 498L, 923L, 1041L, 321L, 
    557L, 628L, 197L, 155L, 955L), event = c(1L, 0L, 1L, 0L, 0L, 
    0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 
    0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
    0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L)), .Names = c("row.names", 
    "death", "type", "time", "event"), class = "data.frame", row.names = c(NA, 
    -50L))
    
    library(dplyr)
    library(zoo)
    library(RColorBrewer)
    SurvDataSummary <- 
      arrange(SurvData, time, type) %>%
      mutate(type = ifelse(is.na(type), "Alive", type)) %>%
      group_by(time) %>%
      #* Count the number of each type at each time point
      summarise(n_at_time = n(),
                alive_at_time = sum(type == "Alive"),
                type1_at_time = sum(type == "Type1"),
                type2_at_time = sum(type == "Type2"),
                type3_at_time = sum(type == "Type3")) %>%
      ungroup() %>%
      mutate(n_alive = sum(n_at_time) - cumsum(lag(n_at_time, default = 0)),
             #* Proportion of each type
             p_type1_at_time = type1_at_time / n_alive,
             p_type2_at_time = type2_at_time / n_alive,
             p_type3_at_time = type3_at_time / n_alive,
             #* convert 0 to NA
             p_type1_at_time = ifelse(p_type1_at_time == 0, NA, p_type1_at_time),
             p_type2_at_time = ifelse(p_type2_at_time == 0, NA, p_type2_at_time),
             p_type3_at_time = ifelse(p_type3_at_time == 0, NA, p_type3_at_time),
             #* Back fill NAs with last known value
             p_type1_at_time = na.locf(p_type1_at_time, FALSE),
             p_type2_at_time = na.locf(p_type2_at_time, FALSE),
             p_type3_at_time = na.locf(p_type3_at_time, FALSE),
             #* make leading NAs 0
             p_type1_at_time = ifelse(is.na(p_type1_at_time), 0, p_type1_at_time),
             p_type2_at_time = ifelse(is.na(p_type2_at_time), 0, p_type2_at_time),
             p_type3_at_time = ifelse(is.na(p_type3_at_time), 0, p_type3_at_time),
             #* Calculate cumulative proportions
             p_alive_at_time = 1 - p_type1_at_time - p_type2_at_time - p_type3_at_time,
             cump_type1_at_time = p_alive_at_time + p_type1_at_time,
             cump_type2_at_time = cump_type1_at_time + p_type2_at_time,
             cump_type3_at_time = cump_type2_at_time + p_type3_at_time,
             #* Get the following time for using geom_rect
             next_time = lead(time)) %>%
    
    pal <- brewer.pal(4, "PRGn")
    ggplot(SurvDataSummary,
           aes(xmin = time,
               xmax = next_time)) + 
      geom_rect(aes(ymin = 0, ymax = p_alive_at_time), fill = pal[1]) + 
      geom_rect(aes(ymin = p_alive_at_time, ymax = cump_type1_at_time), fill = pal[2]) + 
      geom_rect(aes(ymin = cump_type1_at_time, ymax = cump_type2_at_time), fill = pal[3]) + 
      geom_rect(aes(ymin = cump_type2_at_time, ymax = cump_type3_at_time), fill = pal[4])
    

    enter image description here