I'm trying to do an exponentially weighted moving average (EWMA) for the value of some store items. For the first value in the EWMA column, I require it to be the first value of the item on day 1 and then I want the EWMA to use that value initially and then go from there. However, I keep getting stuck and getting a bunch of NA's. In excel, I'd normally just set the first cell of the EWMA column equal to the first cell of the value for each item and then begin by equation in the second row.
Here is an example of what my data looks like:
library(tidyverse)
## simulate data
set.seed(1)
item <- rep(LETTERS[1:3], each = 50)
day <- rep(1:50, times = 3)
value <- c(
round(rnorm(n = 20, mean = 120, sd = 40), 2),
round(rnorm(n = 10, mean = 150, sd = 20), 2),
round(rnorm(n = 20, mean = 110, sd = 30), 2),
round(rnorm(n = 20, mean = 120, sd = 40), 2),
round(rnorm(n = 10, mean = 150, sd = 20), 2),
round(rnorm(n = 20, mean = 110, sd = 30), 2),
round(rnorm(n = 20, mean = 120, sd = 40), 2),
round(rnorm(n = 10, mean = 150, sd = 20), 2),
round(rnorm(n = 20, mean = 110, sd = 30), 2))
df <- data.frame(item, day, value)
df %>% head()
item day value
1 A 1 94.94
2 A 2 127.35
3 A 3 86.57
4 A 4 183.81
5 A 5 133.18
6 A 6 87.18
For the EWMA, I'm using the equation lamda * value + (1 - lamda) * lag(EWMA)
, again the caveat being that I want the EWMA row for each item to be the starting value on day 1.
Here is what I've tried:
lamda <- 0.3
df <- df %>%
group_by(item) %>%
mutate(ewma = ifelse(day == 1, value, NA),
ewma = lamda*value + ((1 - lamda) * lag(ewma)))
And this is what it has produced:
# A tibble: 150 x 4
# Groups: item [3]
item day value ewma
<fct> <int> <dbl> <dbl>
1 A 1 94.9 NA
2 A 2 127. 105.
3 A 3 86.6 NA
4 A 4 184. NA
5 A 5 133. NA
6 A 6 87.2 NA
7 A 7 140. NA
8 A 8 150. NA
9 A 9 143. NA
10 A 10 108. NA
If, for example, I'm running this in excel, I would expect those first 10 rows to look like this:
item day value ewma
1 A 1 94.94 94.9
2 A 2 127.35 104.5
3 A 3 86.57 99.2
4 A 4 183.81 124.6
5 A 5 133.18 127.1
6 A 6 87.18 115.1
7 A 7 139.50 122.6
8 A 8 149.53 130.8
9 A 9 143.03 134.5
10 A 10 107.78 126.5
Is there an efficient way to create this in tidyverse
?
Each computation of ewma
will become the input of the next computation. It's a typical case of reduce()
or accumulate()
in purrr
.
library(dplyr)
library(purrr)
df %>%
group_by(item) %>%
mutate(ewma = accumulate(value, ~ lamda * .y + (1 - lamda) * .x))
# # A tibble: 150 x 4
# # Groups: item [3]
# item day value ewma
# <fct> <int> <dbl> <dbl>
# 1 A 1 94.9 94.9
# 2 A 2 127. 105.
# 3 A 3 86.6 99.2
# 4 A 4 184. 125.
# 5 A 5 133. 127.
# 6 A 6 87.2 115.
# 7 A 7 140. 122.
# 8 A 8 150. 131.
# 9 A 9 143. 134.
# 10 A 10 108. 126.