Search code examples
rplotsurvival-analysis

Which curve is which in Survival Function plot?


I am plotting survival functions with the survival package. Everything works fine, but how do I know which curve is which? And how can I add it to a legend?

  url <- "http://socserv.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
  Rossi <- read.table(url, header=TRUE)[,c(1:10)]
  km <- survfit(Surv(week, arrest)~race, data=Rossi)
  plot(km, lty=c(1 ,2))

Solution

  • how do I know which curve is which?

    Using str() you can see which elements are in km. km$strata shows there are 48 and 10 elements. This coincides with the declining pattern of the first 48 items and last 10 items in km$surv

    km$surv[1:48]
    km$surv[49:58]
    

    So in addition to the hint on order in print(), with this particular dataset we can also be sure that the first 48 elements belong to race=black

    And how can I add it to a legend?

    Unlike other model output km is not easily transformed to a data.frame. However, we can extract the elements ourselves and create a data.frame and then plot it ourselves.

    First we create a factor referring to the strata: 48 blacks and 10 others

    race <- as.factor(c(rep("black", 48), rep("other", 10)))
    df <- data.frame(surv = km$surv, race = race, time = km$time)
    

    Next we can plot it as usual (in my case, using ggplot2).

    library(ggplot2)
    ggplot(data = df, aes(x = time, y = surv)) + 
        geom_point(aes(colour = race)) + 
        geom_line(aes(colour = race)) +
        theme_bw()
    

    survival by race