Search code examples
rtime-seriesmoving-average

Comparing a plot of moving averages over different years in R language


I would like a simple way in R to visualize moving average data and compare the evolution over the years.

For example, let's say I have a time series over 5 years and I plot the moving average :

library(tidyverse) 
library(tidyquant)
days <- seq(as.Date("2018-01-01"), as.Date("2023-03-15"), by = "days")

# let's generate a series roughly linear with time with random noise and a seasonal effect
sales <- 2 * as.double(days) /1000 - sin(as.double(days) /365 *2 *3.14) + rnorm(1900)
 
data <- tibble(days, sales)
data %>%
  ggplot(aes(days, sales)) +
    geom_point(color = "grey") +
    geom_ma(ma_fun = SMA,linetype = 1, size = 1 , n = 14,color = "orange") 

tidyquant:geom_ma is very convenient, I can fiddle with n to find the right span for the moving average and make it smooth. However, it can be difficult to tell apart the seasonal effect from the trend.

What I would like is to have the same graphic , but overlaying all the sales figures over the span of one year (from 1/01 to 31/12), with a different color for the moving average of each year. I could then quickly see if there is a seasonal effect and a trend.

I understand I could calculate the moving_average in the data set, but it must be done before splitting by year for boundary reasons and it kind of defeats the purpose of having a simple geom_ma graph. On top of that, I failed to find an obvious function to do it.

How would you address this issue ? Thanks


Solution

  • Edit Added a moving average using the slider package. This specific code adds a moving average of order 5, but that's flexible.

    Using gg_season from the feasts package:

    library(fpp3)
    #> ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
    #> ✔ tibble      3.2.1     ✔ tsibble     1.1.3
    #> ✔ dplyr       1.1.0     ✔ tsibbledata 0.4.1
    #> ✔ tidyr       1.3.0     ✔ feasts      0.3.0
    #> ✔ lubridate   1.9.2     ✔ fable       0.3.2
    #> ✔ ggplot2     3.4.1     ✔ fabletools  0.3.2
    #> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
    #> ✖ lubridate::date()    masks base::date()
    #> ✖ dplyr::filter()      masks stats::filter()
    #> ✖ tsibble::intersect() masks base::intersect()
    #> ✖ tsibble::interval()  masks lubridate::interval()
    #> ✖ dplyr::lag()         masks stats::lag()
    #> ✖ tsibble::setdiff()   masks base::setdiff()
    #> ✖ tsibble::union()     masks base::union()
    library(slider)
    days <- seq(as.Date("2018-01-01"), as.Date("2023-03-15"), by = "days")
    
    sales <- 2 * as.double(days) /1000 - sin(as.double(days) /365 *2 *3.14) + rnorm(1900)
    data <- tibble(days, sales)
    
    data %>%
      as_tsibble(index = days) %>%
      mutate(MA5 = slide_dbl(sales,
                                  mean,
                                  .before = 2,
                                  .after = 2,
                                  .complete = TRUE)) %>%
      gg_season(MA5, labels = 'both')
    #> Warning: Removed 4 rows containing missing values (`geom_line()`).
    #> Warning: Removed 1 rows containing missing values (`geom_text()`).
    #> Removed 1 rows containing missing values (`geom_text()`).
    

    Created on 2023-03-23 with reprex v2.0.2