I have a dataset similar to the following in R. Each subject has one row:
> ( fake = data.frame(id=c(1,2,3), x=c(42,61,50), event=c(0,0,1), followup=c(6,2,12)) )
id x event followup
1 1 42 0 6
2 2 61 0 2
3 3 50 1 12
I want to split the dataset into intervals defined by the observed event times:
id x event start.time stop.time
1 1 42 0 0 2
2 1 42 0 2 6
3 2 61 0 0 2
4 3 50 0 0 2
5 3 50 0 2 6
6 3 50 1 6 12
So each subject receives intervals for all event times that are shorter than his own followup time. Subject 3, who had the event at time 12, receives 0s for the earlier time intervals when he was still alive.
How should I do this? The actual dataset has about 20,000 rows and 900 unique event times.
The conditions are not very clear.
res <- do.call(rbind, lapply(split(fake, fake$id), function(x) {
x1 <- x$followup
indx <- cumsum(seq(0, 6, by = 2))
indx1 <- indx[1:which(indx == x1)]
indx2 <- rep(indx1, each = 2)
indx3 <- indx2[-c(1, length(indx2))]
x2 <- do.call(rbind, lapply(split(indx3, (seq_along(indx3) - 1)%/%2 + 1), function(y) data.frame(id = x$id,
x = x$x, event = x$event, start.time = y[1], stop.time = y[2])))
if (all(!(!x2$event)))
x2$event[-length(x2$event)] <- 0
x2
}))
row.names(res) <- 1:nrow(res)
res
# id x event start.time stop.time
#1 1 42 0 0 2
#2 1 42 0 2 6
#3 2 61 0 0 2
#4 3 50 0 0 2
#5 3 50 0 2 6
#6 3 50 1 6 12