Search code examples
rsurvival-analysis

Creating time variables for conditional risk set model (Cox regression)


I'm preparing a dataset to a fit a conditional risk set model by using stratified Cox regression. And I was wondering whether there is any way to create the variables I need without running time-consuming loops.

Basically, my data frame is like this, showing whether and when a given country (ID) has experienced some event during a specific time period:

 year    ID event   time
 1991    UK     0      1
 1992    UK     0      2
 1993    UK     0      3
 1994    UK     0      4
 1995    UK     0      5
 1996    UK     0      6
 1997    UK     0      7
 1998    UK     0      8

 1991    FR     0      1
 1992    FR     1      2
 1993    FR     1      3
 1994    FR     0      4
 1995    FR     0      5
 1996    FR     1      6
 1997    FR     0      7
 1998    FR     0      8

 1991    IT     1      1
 1992    IT     0      2
 1993    IT     0      3
 1994    IT     0      4
 1995    IT     0      5
 1996    IT     1      6
 1997    IT     0      7
 1998    IT     0      8

I need to create two more variables: a conditional time variable, similar to time, but 'resetting the clock' each time an event occurs; and a sequence variable that indicates in which sequence or stage is the country, i.e., whether the next event would be the second, third, fourth... (the number should increase after the event). Thus, the data would look like this:

 year    ID event   time cond.time sequence
 1991    UK     0      1         1        1
 1992    UK     0      2         2        1
 1993    UK     0      3         3        1
 1994    UK     0      4         4        1
 1995    UK     0      5         5        1
 1996    UK     0      6         6        1
 1997    UK     0      7         7        1
 1998    UK     0      8         8        1

 1991    FR     0      1         1        1
 1992    FR     1      2         2        1
 1993    FR     1      3         1        2
 1994    FR     0      4         1        3
 1995    FR     0      5         2        3
 1996    FR     1      6         3        3
 1997    FR     0      7         1        4
 1998    FR     0      8         2        4

 1991    IT     1      1         1        1
 1992    IT     0      2         1        2
 1993    IT     0      3         2        2
 1994    IT     0      4         3        2
 1995    IT     0      5         4        2
 1996    IT     1      6         5        2
 1997    IT     0      7         1        3
 1998    IT     0      8         2        3

Anyone knows how could this be done in some efficient way? I was trying to do it with the ddply function, but didn't find out how.


Solution

  • You can use data.table package. If df is your original data.frame:

    library(magrittr)
    library(data.table)
    dt = data.table(df)
    
    dt[,temp:=ifelse(is.na(lag(event,1)), as.integer(0), lag(event,1)), by=ID]
    dt[, sequence:=cumsum(temp)+1, by=ID]
    
    func = function(x)
    {
        which(c(1,lag(x,1)[-1]) %in% 1) %>%
        c(length(x)+1) %>% 
        diff
    }
    
    dt[, cond.time:=func(event) %>% lapply(seq) %>% unlist, by=ID]
    
    > dt
        year ID event time temp sequence cond.time
     1: 1991 UK     0    1    0        1         1
     2: 1992 UK     0    2    0        1         2
     3: 1993 UK     0    3    0        1         3
     4: 1994 UK     0    4    0        1         4
     5: 1995 UK     0    5    0        1         5
     6: 1996 UK     0    6    0        1         6
     7: 1997 UK     0    7    0        1         7
     8: 1998 UK     0    8    0        1         8
     9: 1991 FR     0    1    0        1         1
    10: 1992 FR     1    2    0        1         2
    11: 1993 FR     1    3    1        2         1
    12: 1994 FR     0    4    1        3         1
    13: 1995 FR     0    5    0        3         2
    14: 1996 FR     1    6    0        3         3
    15: 1997 FR     0    7    1        4         1
    16: 1998 FR     0    8    0        4         2
    17: 1991 IT     1    1    0        1         1
    18: 1992 IT     0    2    1        2         1
    19: 1993 IT     0    3    0        2         2
    20: 1994 IT     0    4    0        2         3
    21: 1995 IT     0    5    0        2         4
    22: 1996 IT     1    6    0        2         5
    23: 1997 IT     0    7    1        3         1
    24: 1998 IT     0    8    0        3         2