Search code examples
rdplyrlag

How to perform lag in R when there are multiple repeating rows for a group


Suppose I have a data frame as follows:

date price company
2000-10-01 18 A
2001-10-01 20 A
2001-10-01 20 A
2001-10-01 20 A

I want to create a new variable lagged_price as follows:

date price company lagged_price
2000-10-01 18 A NA
2001-10-01 20 A 18
2001-10-01 20 A 18
2001-10-01 20 A 18

The new variable, lagged_price, takes the lagged value of price for group company. That is, lagged_price captures the price for the company on a previous date. Using group_by is problematic since it captures the value in the preceding row of the group company. Instead, I want to capture the lagged price on the previous date for that company. I also do not want to perform distinct() on the original dataset. Although that does the job in this example, I still want to keep other rows.

my failed solution:

out <- data %>%
group_by(company) %>%
mutate(lagged_price = lag(price))

Any help is appreciated.


Solution

  • Lagging before grouping gives

    df %>% 
      mutate(lagged_price = lag(price)) %>% 
      group_by(date) %>% 
      mutate(lagged_price = lagged_price[1]) %>% 
      ungroup()
    # A tibble: 4 × 4
      date       price company lagged_price
      <chr>      <int> <chr>          <int>
    1 2000-10-01    18 A                 NA
    2 2001-10-01    20 A                 18
    3 2001-10-01    20 A                 18
    4 2001-10-01    20 A                 18