I would like a simple way in R to visualize moving average data and compare the evolution over the years.
For example, let's say I have a time series over 5 years and I plot the moving average :
library(tidyverse)
library(tidyquant)
days <- seq(as.Date("2018-01-01"), as.Date("2023-03-15"), by = "days")
# let's generate a series roughly linear with time with random noise and a seasonal effect
sales <- 2 * as.double(days) /1000 - sin(as.double(days) /365 *2 *3.14) + rnorm(1900)
data <- tibble(days, sales)
data %>%
ggplot(aes(days, sales)) +
geom_point(color = "grey") +
geom_ma(ma_fun = SMA,linetype = 1, size = 1 , n = 14,color = "orange")
tidyquant:geom_ma is very convenient, I can fiddle with n to find the right span for the moving average and make it smooth. However, it can be difficult to tell apart the seasonal effect from the trend.
What I would like is to have the same graphic , but overlaying all the sales figures over the span of one year (from 1/01 to 31/12), with a different color for the moving average of each year. I could then quickly see if there is a seasonal effect and a trend.
I understand I could calculate the moving_average in the data set, but it must be done before splitting by year for boundary reasons and it kind of defeats the purpose of having a simple geom_ma graph. On top of that, I failed to find an obvious function to do it.
How would you address this issue ? Thanks
Edit Added a moving average using the slider
package. This specific code adds a moving average of order 5, but that's flexible.
Using gg_season
from the feasts
package:
library(fpp3)
#> ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
#> ✔ tibble 3.2.1 ✔ tsibble 1.1.3
#> ✔ dplyr 1.1.0 ✔ tsibbledata 0.4.1
#> ✔ tidyr 1.3.0 ✔ feasts 0.3.0
#> ✔ lubridate 1.9.2 ✔ fable 0.3.2
#> ✔ ggplot2 3.4.1 ✔ fabletools 0.3.2
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date() masks base::date()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval() masks lubridate::interval()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ tsibble::setdiff() masks base::setdiff()
#> ✖ tsibble::union() masks base::union()
library(slider)
days <- seq(as.Date("2018-01-01"), as.Date("2023-03-15"), by = "days")
sales <- 2 * as.double(days) /1000 - sin(as.double(days) /365 *2 *3.14) + rnorm(1900)
data <- tibble(days, sales)
data %>%
as_tsibble(index = days) %>%
mutate(MA5 = slide_dbl(sales,
mean,
.before = 2,
.after = 2,
.complete = TRUE)) %>%
gg_season(MA5, labels = 'both')
#> Warning: Removed 4 rows containing missing values (`geom_line()`).
#> Warning: Removed 1 rows containing missing values (`geom_text()`).
#> Removed 1 rows containing missing values (`geom_text()`).
Created on 2023-03-23 with reprex v2.0.2