I am trying to use a combination of conditional lagging and then filling values down by group. In my data, I have old_price and new_price. The new_price must always be lower than old_price. Whenever new_price is greater than old_price, I would like to lag back to the most recent value where new_price was less than old_price. In the case of Raleigh, rows 2 and 3 should lag back to 36.00. Row 4 should not lag back since new_price is already lower than old_price. When I have tried using lag, it has been applying it to row 2 (where the price is 52), but then leaving row 3 as 54.00. I would like row 3 to also lag from row 1, or from row 2 once it has the correct value.
Here is my data:
city sku month year old_price new_price
Raleigh 001 Dec 2021 45.00 36.00
Raleigh 001 Jan 2022 45.00 52.00
Raleigh 001 Feb 2022 45.00 54.00
Raleigh 001 Mar 2022 45.00 37.00
Austin 002 Dec 2021 37.50 30.00
Austin 002 Jan 2022 37.50 32.00
Austin 002 Feb 2022 37.50 48.00
Desired output:
city sku month year old_price new_price
Raleigh 001 Dec 2021 45.00 36.00
Raleigh 001 Jan 2022 45.00 36.00
Raleigh 001 Feb 2022 45.00 36.00
Raleigh 001 Mar 2022 45.00 37.00
Austin 002 Dec 2021 37.50 30.00
Austin 002 Jan 2022 37.50 32.00
Austin 002 Feb 2022 37.50 32.00
One approach is to convert values where new_price > old_price
to NA
and then fill down.
library(dplyr)
library(tidyr)
df %>%
mutate(new_price = if_else(new_price > old_price, NA_real_, new_price)) %>%
fill(new_price)
Output:
city sku month year old_price new_price
1 Raleigh 1 Dec 2021 45.0 36
2 Raleigh 1 Jan 2022 45.0 36
3 Raleigh 1 Feb 2022 45.0 36
4 Raleigh 1 Mar 2022 45.0 37
5 Austin 2 Dec 2021 37.5 30
6 Austin 2 Jan 2022 37.5 32
7 Austin 2 Feb 2022 37.5 32
Data:
df <- read.table(textConnection("city sku month year old_price new_price
Raleigh 001 Dec 2021 45.00 36.00
Raleigh 001 Jan 2022 45.00 52.00
Raleigh 001 Feb 2022 45.00 54.00
Raleigh 001 Mar 2022 45.00 37.00
Austin 002 Dec 2021 37.50 30.00
Austin 002 Jan 2022 37.50 32.00
Austin 002 Feb 2022 37.50 48.00"), header = TRUE)