Search code examples
rlag

lag function in tidyverse when the starting value is in a different column


I'm trying to do an exponentially weighted moving average (EWMA) for the value of some store items. For the first value in the EWMA column, I require it to be the first value of the item on day 1 and then I want the EWMA to use that value initially and then go from there. However, I keep getting stuck and getting a bunch of NA's. In excel, I'd normally just set the first cell of the EWMA column equal to the first cell of the value for each item and then begin by equation in the second row.

Here is an example of what my data looks like:

library(tidyverse)

## simulate data
set.seed(1)
item <- rep(LETTERS[1:3], each = 50)
day <- rep(1:50, times = 3)
value <- c(
  round(rnorm(n = 20, mean = 120, sd = 40), 2),
  round(rnorm(n = 10, mean = 150, sd = 20), 2),
  round(rnorm(n = 20, mean = 110, sd = 30), 2),
  round(rnorm(n = 20, mean = 120, sd = 40), 2),
  round(rnorm(n = 10, mean = 150, sd = 20), 2),
  round(rnorm(n = 20, mean = 110, sd = 30), 2),
  round(rnorm(n = 20, mean = 120, sd = 40), 2),
  round(rnorm(n = 10, mean = 150, sd = 20), 2),
  round(rnorm(n = 20, mean = 110, sd = 30), 2))


df <- data.frame(item, day, value)
df %>% head()

  item day  value
1    A   1  94.94
2    A   2 127.35
3    A   3  86.57
4    A   4 183.81
5    A   5 133.18
6    A   6  87.18

For the EWMA, I'm using the equation lamda * value + (1 - lamda) * lag(EWMA), again the caveat being that I want the EWMA row for each item to be the starting value on day 1.

Here is what I've tried:

lamda <- 0.3


df <- df %>%
  group_by(item) %>%
  mutate(ewma = ifelse(day == 1, value, NA),
         ewma = lamda*value + ((1 - lamda) * lag(ewma)))

And this is what it has produced:

# A tibble: 150 x 4
# Groups:   item [3]
   item    day value  ewma
   <fct> <int> <dbl> <dbl>
 1 A         1  94.9  NA  
 2 A         2 127.   105.
 3 A         3  86.6  NA  
 4 A         4 184.   NA  
 5 A         5 133.   NA  
 6 A         6  87.2  NA  
 7 A         7 140.   NA  
 8 A         8 150.   NA  
 9 A         9 143.   NA  
10 A        10 108.   NA  

If, for example, I'm running this in excel, I would expect those first 10 rows to look like this:

   item day  value  ewma
1     A   1  94.94  94.9
2     A   2 127.35 104.5
3     A   3  86.57  99.2
4     A   4 183.81 124.6
5     A   5 133.18 127.1
6     A   6  87.18 115.1
7     A   7 139.50 122.6
8     A   8 149.53 130.8
9     A   9 143.03 134.5
10    A  10 107.78 126.5

Is there an efficient way to create this in tidyverse?


Solution

  • Each computation of ewma will become the input of the next computation. It's a typical case of reduce() or accumulate() in purrr.

    library(dplyr)
    library(purrr)
    
    df %>%
      group_by(item) %>%
      mutate(ewma = accumulate(value, ~ lamda * .y + (1 - lamda) * .x))
    
    # # A tibble: 150 x 4
    # # Groups:   item [3]
    #    item    day value  ewma
    #    <fct> <int> <dbl> <dbl>
    #  1 A         1  94.9  94.9
    #  2 A         2 127.  105. 
    #  3 A         3  86.6  99.2
    #  4 A         4 184.  125. 
    #  5 A         5 133.  127. 
    #  6 A         6  87.2 115. 
    #  7 A         7 140.  122. 
    #  8 A         8 150.  131. 
    #  9 A         9 143.  134. 
    # 10 A        10 108.  126.