I have this data frame:
have <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1))
> have
ID tstart tstop tdc
1 1 0 100 0
2 1 100 200 1
3 1 200 500 1
4 2 0 400 0
5 3 0 300 1
I would like to add a column with the cumulative sum of time from the interval tstart to tstop, conditional on tdc=1, i.e.,
want <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1),
tdc.cum = c(0,100,400,0,300))
> want
ID tstart tstop tdc tdc.cum
1 1 0 100 0 0
2 1 100 200 1 100
3 1 200 500 1 400
4 2 0 400 0 0
5 3 0 300 1 300
It would also be helpful to see how to lag the cumulative time by 10 units, i.e., subtract 10 units from each ID's total cumulative sum of rows with tdc=1.
want2 <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1),
tdc.cum = c(0,100,390,0,290))
> want2
ID tstart tstop tdc tdc.cum
1 1 0 100 0 0
2 1 100 200 1 100
3 1 200 500 1 390
4 2 0 400 0 0
5 3 0 300 1 290
I have tried to set up this data frame using survival::tmerge() and cumtdc() but I have only been able to get the cumulative sum of tdc (1 or 0) instead of the time interval. Thank you.
Using dplyr
, this will achieve what you need. First, it calculates the cumulative sum of start/stop differences by group (ID). Second, if the last value in tdc for each group != 0, subtract 10 from last value in "tdc.cum":
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1))
df %>%
group_by(ID) %>%
mutate(tdc.cum = cumsum(ifelse(tdc == 1, tstop - tstart, 0)),
tdc.cum = ifelse(tdc.cum == max(tdc.cum) & tdc != 0,
tdc.cum - 10, tdc.cum)) %>%
ungroup()
# A tibble: 5 × 5
ID tstart tstop tdc tdc.cum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 100 0 0
2 1 100 200 1 100
3 1 200 500 1 390
4 2 0 400 0 0
5 3 0 300 1 290