Here is a sample of my df:
library(dplyr)
# Create the dataframe
df <- data.frame(
id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
)
It has supply start date and quantity supplied for 3 IDs. For each ID, I want to do the following 1) create supply end date 2) if a supply from the previous row lasts longer than the current supply start date, generate a new supply begin and new supply end date. for example for A, first supply ends on 1/30/2022 and second supply begins on 1/20/2022 i.e, there is an overlap of 10 days.In this scenario, new_supply_begin_date for row 2 is 1/31/2022. If a supply_start_date and previous supply_end_date does not overlap, then no modification is necessary. Evaluation is done by ID. This is what I have tried
# Add supply_end column
df <- df %>%
mutate(supply_end_date= supply_start_date + ( supply_qty - 1))
df2<-df %>%
arrange(id, supply_start_date) %>%
group_by(id) %>%
mutate(new_supply_start_date=as.Date(ifelse(row_number()>1 & (supply_start_date<=lag(supply_end_date,default = first(supply_end_date))+supply_qty-1),
lag(supply_end_date,default=first(supply_end_date)+1),
supply_start_date)),
new_supply_end_date=as.Date(new_supply_start_date+supply_qty-1)) %>%
ungroup()
df2
if you look at the last row, for example, the new_supply_start_date should be 2017-06-26 and not 2017-06-24. I think I need to iteratively modify new_supply_start_date then modify new_supply_end_date for each row but I am not sure how to achieve that. Any help/tip is much appreciated. Thanks
Here is a solution that iterates over the rows of each subset of the data frame. I used the split function to make the subsets and one of the map_() functions from purrr to modify each subset. I added a Row column to the output data frame to make it easier to track the subsets. That line of code can be deleted.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Create the dataframe
df <- data.frame(
id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
)
# Add supply_end column
df <- df %>%
mutate(supply_end_date= supply_start_date + ( supply_qty - 1))
AdjDates <- function(DF) {
DF <- DF |> mutate(Row = row_number())
for (i in 1:nrow(DF)) {
if (i > 1) {
PrevEnd <- DF[i-1, 'supply_end_date']
if (PrevEnd >= DF[i, 'supply_start_date']) {
DF[i, 'supply_start_date'] = PrevEnd + 1
DF[i, 'supply_end_date'] = DF[i, 'supply_start_date'] + DF[i, 'supply_qty'] -1
}
}
}
return(DF)
}
df |> split(df$id) |> purrr::map_dfr(AdjDates)
#> id supply_start_date supply_qty supply_end_date Row
#> 1 A 2024-01-01 30 2024-01-30 1
#> 2 A 2024-01-31 60 2024-03-30 2
#> 3 A 2024-04-20 100 2024-07-28 3
#> 4 B 2024-01-02 100 2024-04-10 1
#> 5 C 2018-03-01 100 2018-06-08 1
#> 6 C 2018-07-03 100 2018-10-10 2
#> 7 C 2018-10-11 100 2019-01-18 3
#> 8 C 2019-01-23 100 2019-05-02 4
#> 9 D 2017-04-28 30 2017-05-27 1
#> 10 D 2017-05-28 30 2017-06-26 2
#> 11 D 2017-06-27 30 2017-07-26 3
Created on 2025-02-03 with reprex v2.1.1