Search code examples
rggplot2fable-r

How to add normal plot to a ggplot of residuals


I would like to add a plot of the normal distribution with mean and variance of the residuals in a model in the same plot as the histogram of the residuals.

I am using the code below from chapter 5 of fpp3:

aug <- google_2015 |>
    model(NAIVE(Close)) |>
    augment()

autoplot(aug, .innov) +
    labs(y = "$US",
         title = "Residuals from the naïve method")

aug |>
    ggplot(aes(x = .innov)) +
    geom_histogram() +
    labs(title = "Histogram of residuals")

Solution

  • It would be best to tell us that fpp3 is "Forecasting: Principles and Practice" (3rd Edition), that there is a corresponding R package, and to show us the code for creating the google_2015 object so we didn't have to go dig it out, but here you go ...

    setup

    library(fpp3)
    google_2015 <- gafa_stock |>
      filter(Symbol == "GOOG", year(Date) >= 2015) |>
      mutate(day = row_number()) |>
      update_tsibble(index = day, regular = TRUE) |> 
      filter(year(Date) == 2015)
    aug <- google_2015 |>
        model(NAIVE(Close)) |>
        augment()
    

    plotting code

    We need to (1) plot the histogram on a density rather than a count scale and (2) compute the required mean and SD on the fly (alternatively we could plot the histogram on the count scale and multiply the Normal density by the number of observations, but that seems slightly harder)

    The only improvement of this on the linked duplicate is that the aes(y = ..density..) idiom used in those answers is deprecated as of ggplot2 3.4.0 and will throw a warning ...

    gg0 <- aug |>
        ggplot(aes(x = .innov)) +
        geom_histogram(aes(y=after_stat(density)))
    gg0 + geom_function(fun = dnorm, colour = "red", n = 1001,
             args = list(mean = mean(aug$.innov, na.rm = TRUE), 
                         sd = sd(aug$.innov, na.rm = TRUE)))