Search code examples
rsurvival-analysis

R Function for Handling Survival Data in intervals


Hello I am learning about survival analysis and I was curious if I could use the survival package on survival data of this form:

enter image description here

Here is some code to genereate data in this form

start_interval <-  seq(0, 13)
end_interval <-  seq(1, 14)
living_at_start <- round(seq(1000, 0, length.out = 14))
dead_in_interval <- c(abs(diff(living_at_start)), 0)
df <- data.frame(start_interval, end_interval, living_at_start, dead_in_interval)

From my use of the survival package so far it seems to have each individual be a survival time but I might be misreading the documentation of the Surv function. If survival will not work what other packages are out there for this type of data. If there is not a package or function to easily to estimate the survival function I can easily calculate the survival times myself with the following equation.

enter image description here


Solution

  • Since the survival package need one observation per survival time we need to do some transformations. Using the simulated data.

    Simulated Data:

    library(survival)
    start_interval <-  seq(0, 13)
    end_interval <-  seq(1, 14)
    living_at_start <- round(seq(1000, 0, length.out = 14))
    dead_in_interval <- c(abs(diff(living_at_start)), 0)
    df <- data.frame(start_interval, end_interval, living_at_start, dead_in_interval)
    

    Transforming the data by duplicated by the number dead

    duptimes <- df$dead_in_interval
    rid <- rep(1:nrow(df), duptimes)
    df.t <- df[rid,]
    

    Using the Surv Function

    test <- Surv(time = df.t$start_interval,
         time2 = df.t$end_interval,
         event = rep(1, nrow(df.t)), #Every Observation is a death
         type = "interval")
    

    Fitting the survival curve

    summary(survfit(test ~ 1))
    

    enter image description here

    Comparing with by hand calculation from original data

    df$living_at_start/max(df$living_at_start)
    

    enter image description here

    They match.

    Questions

    When using the survfit function why is number of risk 1001 at time 0 when there is only 1000 people in the data?

    length(test)
    

    enter image description here