Search code examples
rdplyrlag

Fill Lagged Values Down R


I am trying to use a combination of conditional lagging and then filling values down by group. In my data, I have old_price and new_price. The new_price must always be lower than old_price. Whenever new_price is greater than old_price, I would like to lag back to the most recent value where new_price was less than old_price. In the case of Raleigh, rows 2 and 3 should lag back to 36.00. Row 4 should not lag back since new_price is already lower than old_price. When I have tried using lag, it has been applying it to row 2 (where the price is 52), but then leaving row 3 as 54.00. I would like row 3 to also lag from row 1, or from row 2 once it has the correct value.

Here is my data:

city          sku  month  year  old_price   new_price
Raleigh       001  Dec    2021  45.00        36.00
Raleigh       001  Jan    2022  45.00        52.00
Raleigh       001  Feb    2022  45.00        54.00
Raleigh       001  Mar    2022  45.00        37.00
Austin        002  Dec    2021  37.50        30.00
Austin        002  Jan    2022  37.50        32.00
Austin        002  Feb    2022  37.50        48.00

Desired output:

city          sku  month  year  old_price   new_price 
Raleigh       001  Dec    2021  45.00        36.00
Raleigh       001  Jan    2022  45.00        36.00
Raleigh       001  Feb    2022  45.00        36.00
Raleigh       001  Mar    2022  45.00        37.00
Austin        002  Dec    2021  37.50        30.00
Austin        002  Jan    2022  37.50        32.00
Austin        002  Feb    2022  37.50        32.00

Solution

  • One approach is to convert values where new_price > old_price to NA and then fill down.

    library(dplyr)
    library(tidyr)
    
    df %>% 
      mutate(new_price = if_else(new_price > old_price, NA_real_, new_price)) %>% 
      fill(new_price)
    

    Output:

         city sku month year old_price new_price
    1 Raleigh   1   Dec 2021      45.0        36
    2 Raleigh   1   Jan 2022      45.0        36
    3 Raleigh   1   Feb 2022      45.0        36
    4 Raleigh   1   Mar 2022      45.0        37
    5  Austin   2   Dec 2021      37.5        30
    6  Austin   2   Jan 2022      37.5        32
    7  Austin   2   Feb 2022      37.5        32
    

    Data:

    df <- read.table(textConnection("city          sku  month  year  old_price   new_price
    Raleigh       001  Dec    2021  45.00        36.00
    Raleigh       001  Jan    2022  45.00        52.00
    Raleigh       001  Feb    2022  45.00        54.00
    Raleigh       001  Mar    2022  45.00        37.00
    Austin        002  Dec    2021  37.50        30.00
    Austin        002  Jan    2022  37.50        32.00
    Austin        002  Feb    2022  37.50        48.00"), header = TRUE)