Sample Data
set.seed(1)
library(tidyverse)
df1 <- data.frame(
Category = rep(c("Cat1","Cat2","Cat3"),3),
Value = c(sample(c(1:10),3), rep(NA, 6))
)
I'm trying to seed a data frame with the lagged values of data from past years. This is a simplified version of the problem, but in effect, what I need to do is have lag
re-use the previously calculated lag value. If you run the below code, rows 4-6 calculate as I intend, but rows 7-9 remain NA
because lag
looks at the original value, not the new, calculated one. I'd like rows 7-9 to also populate with the values of rows 4-6. I know I could just write a for
loop to pull the values forward, but wanted to see if there is a more R-like way to accomplish this.
df1 %>% group_by(Category) %>%
mutate(Value = ifelse(is.na(Value), lag(Value, 1), Value))
# Groups: Category [3]
Category Value
<fct> <int>
1 Cat1 9
2 Cat2 4
3 Cat3 7
4 Cat1 9
5 Cat2 4
6 Cat3 7
7 Cat1 NA
8 Cat2 NA
9 Cat3 NA
Desired Result
# A tibble: 9 x 2
# Groups: Category [3]
Category Value
<fct> <int>
1 Cat1 9
2 Cat2 4
3 Cat3 7
4 Cat1 9
5 Cat2 4
6 Cat3 7
7 Cat1 9
8 Cat2 4
9 Cat3 7
Not sure if this is applicable to your problem, but you could maybe use fill
?
library(dplyr)
library(tidyr)
df1 %>%
group_by(Category) %>%
fill(Value, .direction = "down")
# A tibble: 9 x 2
# Groups: Category [3]
Category Value
<chr> <int>
1 Cat1 9
2 Cat2 4
3 Cat3 7
4 Cat1 9
5 Cat2 4
6 Cat3 7
7 Cat1 9
8 Cat2 4
9 Cat3 7