R Iteratively modify values based on previous row if a condition is met

Here is a sample of my df:

library(dplyr)

# Create the dataframe
df <- data.frame(
  id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
 supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
 supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
)

It has supply start date and quantity supplied for 3 IDs. For each ID, I want to do the following 1) create supply end date 2) if a supply from the previous row lasts longer than the current supply start date, generate a new supply begin and new supply end date. for example for A, first supply ends on 1/30/2022 and second supply begins on 1/20/2022 i.e, there is an overlap of 10 days.In this scenario, new_supply_begin_date for row 2 is 1/31/2022. If a supply_start_date and previous supply_end_date does not overlap, then no modification is necessary. Evaluation is done by ID. This is what I have tried

# Add supply_end column
df <- df %>%
  mutate(supply_end_date=  supply_start_date + ( supply_qty - 1))

df2<-df %>% 
  arrange(id, supply_start_date) %>% 
  group_by(id) %>% 
  mutate(new_supply_start_date=as.Date(ifelse(row_number()>1 & (supply_start_date<=lag(supply_end_date,default = first(supply_end_date))+supply_qty-1),
         lag(supply_end_date,default=first(supply_end_date)+1),
         supply_start_date)),
         new_supply_end_date=as.Date(new_supply_start_date+supply_qty-1)) %>% 
  ungroup()
df2

if you look at the last row, for example, the new_supply_start_date should be 2017-06-26 and not 2017-06-24. I think I need to iteratively modify new_supply_start_date then modify new_supply_end_date for each row but I am not sure how to achieve that. Any help/tip is much appreciated. Thanks

Solution

Here is a solution that iterates over the rows of each subset of the data frame. I used the split function to make the subsets and one of the map_() functions from purrr to modify each subset. I added a Row column to the output data frame to make it easier to track the subsets. That line of code can be deleted.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Create the dataframe
df <- data.frame(
  id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
  supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
  supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
)
# Add supply_end column
df <- df %>%
  mutate(supply_end_date=  supply_start_date + ( supply_qty - 1))

AdjDates <- function(DF) {
  DF <- DF |> mutate(Row = row_number())
  for (i in 1:nrow(DF)) {
    if (i > 1) {
      PrevEnd <- DF[i-1, 'supply_end_date']
      if (PrevEnd >= DF[i, 'supply_start_date']) {
        DF[i, 'supply_start_date'] = PrevEnd + 1
        DF[i, 'supply_end_date'] = DF[i, 'supply_start_date'] + DF[i, 'supply_qty'] -1
      }
    }
  }
  return(DF)
}
df |> split(df$id) |> purrr::map_dfr(AdjDates)
#>    id supply_start_date supply_qty supply_end_date Row
#> 1   A        2024-01-01         30      2024-01-30   1
#> 2   A        2024-01-31         60      2024-03-30   2
#> 3   A        2024-04-20        100      2024-07-28   3
#> 4   B        2024-01-02        100      2024-04-10   1
#> 5   C        2018-03-01        100      2018-06-08   1
#> 6   C        2018-07-03        100      2018-10-10   2
#> 7   C        2018-10-11        100      2019-01-18   3
#> 8   C        2019-01-23        100      2019-05-02   4
#> 9   D        2017-04-28         30      2017-05-27   1
#> 10  D        2017-05-28         30      2017-06-26   2
#> 11  D        2017-06-27         30      2017-07-26   3

^{Created on 2025-02-03 with reprex v2.1.1}