I'm preparing a dataset to a fit a conditional risk set model by using stratified Cox regression. And I was wondering whether there is any way to create the variables I need without running time-consuming loops.
Basically, my data frame is like this, showing whether and when a given country (ID
) has experienced some event during a specific time period:
year ID event time
1991 UK 0 1
1992 UK 0 2
1993 UK 0 3
1994 UK 0 4
1995 UK 0 5
1996 UK 0 6
1997 UK 0 7
1998 UK 0 8
1991 FR 0 1
1992 FR 1 2
1993 FR 1 3
1994 FR 0 4
1995 FR 0 5
1996 FR 1 6
1997 FR 0 7
1998 FR 0 8
1991 IT 1 1
1992 IT 0 2
1993 IT 0 3
1994 IT 0 4
1995 IT 0 5
1996 IT 1 6
1997 IT 0 7
1998 IT 0 8
I need to create two more variables: a conditional time variable, similar to time
, but 'resetting the clock' each time an event occurs; and a sequence variable that indicates in which sequence or stage is the country, i.e., whether the next event would be the second, third, fourth... (the number should increase after the event). Thus, the data would look like this:
year ID event time cond.time sequence
1991 UK 0 1 1 1
1992 UK 0 2 2 1
1993 UK 0 3 3 1
1994 UK 0 4 4 1
1995 UK 0 5 5 1
1996 UK 0 6 6 1
1997 UK 0 7 7 1
1998 UK 0 8 8 1
1991 FR 0 1 1 1
1992 FR 1 2 2 1
1993 FR 1 3 1 2
1994 FR 0 4 1 3
1995 FR 0 5 2 3
1996 FR 1 6 3 3
1997 FR 0 7 1 4
1998 FR 0 8 2 4
1991 IT 1 1 1 1
1992 IT 0 2 1 2
1993 IT 0 3 2 2
1994 IT 0 4 3 2
1995 IT 0 5 4 2
1996 IT 1 6 5 2
1997 IT 0 7 1 3
1998 IT 0 8 2 3
Anyone knows how could this be done in some efficient way? I was trying to do it with the ddply
function, but didn't find out how.
You can use data.table
package. If df
is your original data.frame
:
library(magrittr)
library(data.table)
dt = data.table(df)
dt[,temp:=ifelse(is.na(lag(event,1)), as.integer(0), lag(event,1)), by=ID]
dt[, sequence:=cumsum(temp)+1, by=ID]
func = function(x)
{
which(c(1,lag(x,1)[-1]) %in% 1) %>%
c(length(x)+1) %>%
diff
}
dt[, cond.time:=func(event) %>% lapply(seq) %>% unlist, by=ID]
> dt
year ID event time temp sequence cond.time
1: 1991 UK 0 1 0 1 1
2: 1992 UK 0 2 0 1 2
3: 1993 UK 0 3 0 1 3
4: 1994 UK 0 4 0 1 4
5: 1995 UK 0 5 0 1 5
6: 1996 UK 0 6 0 1 6
7: 1997 UK 0 7 0 1 7
8: 1998 UK 0 8 0 1 8
9: 1991 FR 0 1 0 1 1
10: 1992 FR 1 2 0 1 2
11: 1993 FR 1 3 1 2 1
12: 1994 FR 0 4 1 3 1
13: 1995 FR 0 5 0 3 2
14: 1996 FR 1 6 0 3 3
15: 1997 FR 0 7 1 4 1
16: 1998 FR 0 8 0 4 2
17: 1991 IT 1 1 0 1 1
18: 1992 IT 0 2 1 2 1
19: 1993 IT 0 3 0 2 2
20: 1994 IT 0 4 0 2 3
21: 1995 IT 0 5 0 2 4
22: 1996 IT 1 6 0 2 5
23: 1997 IT 0 7 1 3 1
24: 1998 IT 0 8 0 3 2