Search code examples
rdplyrlubridate

Incrementally increasing date using lag function for missing dates only


In the example below I would like to derived missing dates using previous days and integer value carried in effort variable only for the missing dates.

# Libraries
library("tidyverse")
library("lubridate")

work_start_date <- dmy("2/11/2020")

dta_tasks <- tribble(
  ~task_no, ~task,  ~effort,
  1,   "Task 1", NA,
  1.1, "Task 1.1", 1,
  1.3, "Task 1.3", 1,
  1.4, "Task 1.4", 2,
  1.5, "Task 1.5", 1,
  2,   "Task 2",   NA,
  2.1, "Task 2.1", 2

)

dta_tasks %>%
  arrange(task_no) %>%
  mutate(start_date = if_else(row_number() == 1, work_start_date, NA_Date_),
         start_date = if_else(is.na(start_date), lag(start_date) + days(effort), start_date))

Desired results

task_no task     effort start_date
<dbl> <chr>     <dbl> <date>    
1     1   Task 1      NA 2020-11-02
2     1.1 Task 1.1      1 2020-11-03
3     1.3 Task 1.3      1 2020-11-04        
4     1.4 Task 1.4      2 2020-11-06        
5     1.5 Task 1.5      1 2020-11-07        
6     2   Task 2       NA 2020-11-08        
7     2.1 Task 2.1      2 2020-11-08  # For NA it has to skip value 

Elaboration

In the context of the code below, I would like to replace Sys.Date() with previous calculated date.

dta_tasks %>%
  arrange(task_no) %>%
  mutate(
    start_date = if_else(row_number() == 1, work_start_date, NA_Date_),
    start_date = if_else(is.na(start_date), Sys.Date() + days(effort), start_date)
  )

Solution

  • Try this:

    dta_tasks %>%
        arrange(task_no) %>% 
        mutate(effort_no_na = pmax(effort, 0, na.rm = TRUE)) %>% 
        mutate(cum_effort = cumsum(effort_no_na),
               start_date = work_start_date + days(effort_no_na),
               start_date = if_else(is.na(effort), NA_Date_, start_date)) %>% 
        fill(start_date, .direction =  "up")
    

    The idea is to use cumsum to track the total effort since the beginning. There is a bunch of bookkeeping because of the NAs.