Hello I am learning about survival analysis and I was curious if I could use the survival
package on survival data of this form:
Here is some code to genereate data in this form
start_interval <- seq(0, 13)
end_interval <- seq(1, 14)
living_at_start <- round(seq(1000, 0, length.out = 14))
dead_in_interval <- c(abs(diff(living_at_start)), 0)
df <- data.frame(start_interval, end_interval, living_at_start, dead_in_interval)
From my use of the survival
package so far it seems to have each individual be a survival time but I might be misreading the documentation of the Surv
function. If survival
will not work what other packages are out there for this type of data.
If there is not a package or function to easily to estimate the survival function I can easily calculate the survival times myself with the following equation.
Since the survival
package need one observation per survival time we need to do some transformations. Using the simulated data.
Simulated Data:
library(survival)
start_interval <- seq(0, 13)
end_interval <- seq(1, 14)
living_at_start <- round(seq(1000, 0, length.out = 14))
dead_in_interval <- c(abs(diff(living_at_start)), 0)
df <- data.frame(start_interval, end_interval, living_at_start, dead_in_interval)
Transforming the data by duplicated by the number dead
duptimes <- df$dead_in_interval
rid <- rep(1:nrow(df), duptimes)
df.t <- df[rid,]
Using the Surv Function
test <- Surv(time = df.t$start_interval,
time2 = df.t$end_interval,
event = rep(1, nrow(df.t)), #Every Observation is a death
type = "interval")
Fitting the survival curve
summary(survfit(test ~ 1))
Comparing with by hand calculation from original data
df$living_at_start/max(df$living_at_start)
They match.
When using the survfit
function why is number of risk 1001 at time 0 when there is only 1000 people in the data?
length(test)