Search code examples
rdataframesurvival

Irregular Interval based representation of survival data in R


I have the following dataset:

df =
id Time A
1  3    0
1  5    1
1  6    1
2  8    0
2  9    0
2  12   1

I want to do two things: i) have a starting time of -1 across all ids, and ii) split the time into two columns; start and end while preserving the time at which the individual got the observation A (setting end as the reference point). The final result should look something like this:

df = 
id start end A
1  -1     0  0  
1  0      2  1
1  2      3  1
2  -1     0  0
2  0      1  0
2  1      4  1

Solution

  • This does the trick with this set. I wasn't 100% sure on the question from the description so tried to go off what I could see here. For future reference, please try pasting in dput(df) as the input data :)

    df <- data.frame(id=c(rep(1,3),rep(2,3)),
                     Time=c(3,5,6,8,9,12),
                     A=c(0,1,1,0,0,1))
    
    library(data.table)
    dt <- as.data.table(df)
    # diff(Time) finds the interval between points
    # cumsum then adds this diff together to take in to account the previous time
    # gaps
    dt[, end := cumsum(c(0, diff(Time))), by=id]
    
    # start is then just a shifted version of end, with the initial start filled as -1
    dt[, start := shift(end, n=1, fill=-1), by=id]
    
    out <- as.data.frame(dt)
    out