Search code examples
rdplyrlag

Can the lag function in R re-use calculated values in R without looping?


Sample Data

set.seed(1)
library(tidyverse)

df1 <- data.frame(
  Category = rep(c("Cat1","Cat2","Cat3"),3),
  Value = c(sample(c(1:10),3), rep(NA, 6))
)

I'm trying to seed a data frame with the lagged values of data from past years. This is a simplified version of the problem, but in effect, what I need to do is have lag re-use the previously calculated lag value. If you run the below code, rows 4-6 calculate as I intend, but rows 7-9 remain NA because lag looks at the original value, not the new, calculated one. I'd like rows 7-9 to also populate with the values of rows 4-6. I know I could just write a for loop to pull the values forward, but wanted to see if there is a more R-like way to accomplish this.

df1 %>% group_by(Category) %>% 
  mutate(Value = ifelse(is.na(Value), lag(Value, 1), Value))


# Groups:   Category [3]
  Category Value
  <fct>    <int>
1 Cat1         9
2 Cat2         4
3 Cat3         7
4 Cat1         9
5 Cat2         4
6 Cat3         7
7 Cat1        NA
8 Cat2        NA
9 Cat3        NA

Desired Result

# A tibble: 9 x 2
# Groups:   Category [3]
  Category Value
  <fct>    <int>
1 Cat1         9
2 Cat2         4
3 Cat3         7
4 Cat1         9
5 Cat2         4
6 Cat3         7
7 Cat1         9
8 Cat2         4
9 Cat3         7

Solution

  • Not sure if this is applicable to your problem, but you could maybe use fill?

    library(dplyr)
    library(tidyr)
    df1 %>% 
      group_by(Category) %>% 
      fill(Value, .direction = "down")
    
    # A tibble: 9 x 2
    # Groups:   Category [3]
      Category Value
      <chr>    <int>
    1 Cat1         9
    2 Cat2         4
    3 Cat3         7
    4 Cat1         9
    5 Cat2         4
    6 Cat3         7
    7 Cat1         9
    8 Cat2         4
    9 Cat3         7