Search code examples
rdplyrlagrowwise

How to calculate the sequence of start and end dates efficiently in a tibble?


I have the following starting point:

#dataset:
schedule <- tibble(start = as.Date(c("2018-07-11", NA, NA)), duration = c(10,23,9),flag_StartActual = c(TRUE,FALSE,FALSE))

in table format:

> schedule
# A tibble: 3 x 3
  start      duration flag_StartActual
  <date>        <dbl> <lgl>           
1 2018-07-11       10 TRUE            
2 NA               23 FALSE           
3 NA                9 FALSE   

I would like to compute the end (= start + duration). Once that is done for the first row, I would like to make the end of the first row the start of the second.

I have experimented with various approaches but so far I have not been successful. The things I have considered is:

  • Getting the end from the previous row with the lag(end) function. This works fine for the second row, but for all the following rows the end does not exist yet.
  • I have experimented with rowwise() but in this case I do not get the lag() function to work.

The following code does more or less what I would like it to do but this is not very neat as a mutate needs to be added for every row (and then all the previous rows are recalculated).

> schedule %>%
+   mutate(
+     end = start + ddays(duration),
+     start = as_datetime(ifelse(flag_StartActual==TRUE,start,lag(end)))
+   )  %>%
+   mutate(
+     end = start + ddays(duration),
+     start = as_datetime(ifelse(flag_StartActual==TRUE,start,lag(end)))
+   )
# A tibble: 3 x 4
  start               duration flag_StartActual end                
  <dttm>                 <dbl> <lgl>            <dttm>             
1 2018-07-11 00:00:00       10 TRUE             2018-07-21 00:00:00
2 2018-07-21 00:00:00       23 FALSE            2018-08-13 00:00:00
3 2018-08-13 00:00:00        9 FALSE            NA                 

Including rowwise() in the code like below does not work:

schedule %>%
  rowwise() %>%
  mutate(
    end = start + ddays(duration),
    start = as_datetime(ifelse(flag_StartActual==TRUE,start,lag(end)))
  )

Anyway, I am a bit stuck and hope that somebody has some smart ideas on how to approach this?


Solution

  • library(dplyr)
    
    schedule %>% 
      mutate(start = start[1] + lag(cumsum(duration), default = 0)
             , end = start + duration)
    
    # # A tibble: 3 x 4
    #   start      duration flag_StartActual end       
    #   <date>        <dbl> <lgl>            <date>    
    # 1 2018-07-11    10.0  T                2018-07-21
    # 2 2018-07-21    23.0  F                2018-08-13
    # 3 2018-08-13     9.00 F                2018-08-22