Search code examples
rdateiterationlag

R Iteratively modify values based on previous row if a condition is met


Here is a sample of my df:

library(dplyr)

# Create the dataframe
df <- data.frame(
  id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
 supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
 supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
)

It has supply start date and quantity supplied for 3 IDs. For each ID, I want to do the following 1) create supply end date 2) if a supply from the previous row lasts longer than the current supply start date, generate a new supply begin and new supply end date. for example for A, first supply ends on 1/30/2022 and second supply begins on 1/20/2022 i.e, there is an overlap of 10 days.In this scenario, new_supply_begin_date for row 2 is 1/31/2022. If a supply_start_date and previous supply_end_date does not overlap, then no modification is necessary. Evaluation is done by ID. This is what I have tried

# Add supply_end column
df <- df %>%
  mutate(supply_end_date=  supply_start_date + ( supply_qty - 1))

df2<-df %>% 
  arrange(id, supply_start_date) %>% 
  group_by(id) %>% 
  mutate(new_supply_start_date=as.Date(ifelse(row_number()>1 & (supply_start_date<=lag(supply_end_date,default = first(supply_end_date))+supply_qty-1),
         lag(supply_end_date,default=first(supply_end_date)+1),
         supply_start_date)),
         new_supply_end_date=as.Date(new_supply_start_date+supply_qty-1)) %>% 
  ungroup()
df2

if you look at the last row, for example, the new_supply_start_date should be 2017-06-26 and not 2017-06-24. I think I need to iteratively modify new_supply_start_date then modify new_supply_end_date for each row but I am not sure how to achieve that. Any help/tip is much appreciated. Thanks


Solution

  • Here is a solution that iterates over the rows of each subset of the data frame. I used the split function to make the subsets and one of the map_() functions from purrr to modify each subset. I added a Row column to the output data frame to make it easier to track the subsets. That line of code can be deleted.

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    # Create the dataframe
    df <- data.frame(
      id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
      supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
      supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
    )
    # Add supply_end column
    df <- df %>%
      mutate(supply_end_date=  supply_start_date + ( supply_qty - 1))
    
    AdjDates <- function(DF) {
      DF <- DF |> mutate(Row = row_number())
      for (i in 1:nrow(DF)) {
        if (i > 1) {
          PrevEnd <- DF[i-1, 'supply_end_date']
          if (PrevEnd >= DF[i, 'supply_start_date']) {
            DF[i, 'supply_start_date'] = PrevEnd + 1
            DF[i, 'supply_end_date'] = DF[i, 'supply_start_date'] + DF[i, 'supply_qty'] -1
          }
        }
      }
      return(DF)
    }
    df |> split(df$id) |> purrr::map_dfr(AdjDates)
    #>    id supply_start_date supply_qty supply_end_date Row
    #> 1   A        2024-01-01         30      2024-01-30   1
    #> 2   A        2024-01-31         60      2024-03-30   2
    #> 3   A        2024-04-20        100      2024-07-28   3
    #> 4   B        2024-01-02        100      2024-04-10   1
    #> 5   C        2018-03-01        100      2018-06-08   1
    #> 6   C        2018-07-03        100      2018-10-10   2
    #> 7   C        2018-10-11        100      2019-01-18   3
    #> 8   C        2019-01-23        100      2019-05-02   4
    #> 9   D        2017-04-28         30      2017-05-27   1
    #> 10  D        2017-05-28         30      2017-06-26   2
    #> 11  D        2017-06-27         30      2017-07-26   3
    

    Created on 2025-02-03 with reprex v2.1.1