In the example below I would like to derived missing dates using previous days and integer value carried in effort
variable only for the missing dates.
# Libraries
library("tidyverse")
library("lubridate")
work_start_date <- dmy("2/11/2020")
dta_tasks <- tribble(
~task_no, ~task, ~effort,
1, "Task 1", NA,
1.1, "Task 1.1", 1,
1.3, "Task 1.3", 1,
1.4, "Task 1.4", 2,
1.5, "Task 1.5", 1,
2, "Task 2", NA,
2.1, "Task 2.1", 2
)
dta_tasks %>%
arrange(task_no) %>%
mutate(start_date = if_else(row_number() == 1, work_start_date, NA_Date_),
start_date = if_else(is.na(start_date), lag(start_date) + days(effort), start_date))
task_no task effort start_date
<dbl> <chr> <dbl> <date>
1 1 Task 1 NA 2020-11-02
2 1.1 Task 1.1 1 2020-11-03
3 1.3 Task 1.3 1 2020-11-04
4 1.4 Task 1.4 2 2020-11-06
5 1.5 Task 1.5 1 2020-11-07
6 2 Task 2 NA 2020-11-08
7 2.1 Task 2.1 2 2020-11-08 # For NA it has to skip value
In the context of the code below, I would like to replace Sys.Date()
with previous calculated date.
dta_tasks %>%
arrange(task_no) %>%
mutate(
start_date = if_else(row_number() == 1, work_start_date, NA_Date_),
start_date = if_else(is.na(start_date), Sys.Date() + days(effort), start_date)
)
Try this:
dta_tasks %>%
arrange(task_no) %>%
mutate(effort_no_na = pmax(effort, 0, na.rm = TRUE)) %>%
mutate(cum_effort = cumsum(effort_no_na),
start_date = work_start_date + days(effort_no_na),
start_date = if_else(is.na(effort), NA_Date_, start_date)) %>%
fill(start_date, .direction = "up")
The idea is to use cumsum
to track the total effort since the beginning. There is a bunch of bookkeeping because of the NAs.