Search code examples
rggplot2smoothing

How to I create a custom ggplot2 smoothing stat (not just a custom lm or glm model)


I have a function which calculates the median and 90% CI using a moving window. So for each x = seq(xmin, xmax, by = wStep), I return the median and 5% and 95% quantiles for all y whose x values are less that wSize/2 away. I want to display this as a line and ribbon using ggplot2 by creating a custom smoothing function, stat_movingwindow(). I can create the result I want using geom_smooth(data = ..., stat = "identity"):

moveWin <- function(d, wSize = 0.5, wStep = 0.1, 
  f = function(x) quantile(x, prob = c(0.05,0.50,0.95), na.rm = TRUE)
){
  x <- seq(min(d$x), max(d$x), by = wStep)
  y <- matrix(NA, ncol = 3, nrow = length(x))
  for(i in seq_along(x)){
    y[i, ] <- f(d[abs(d$x - x[i]) < wSize/2, ]$y)
  }
  y <- as.tibble(y)
  colnames(y) <- c("ymin","y","ymax")
  y$x <- x
  return(as.tibble(y))
}

set.seed(123)
d <- tibble(
 x= sqrt(seq(0,1,length.out = 50))*10,
 y= rnorm(50)
)

ggplot(data = d) + aes(x = x, y = y) +
  geom_smooth(
    data    = function(d) moveWin(d, wSize = 1, wStep = 0.1), 
    mapping = aes(ymin = ymin, ymax= ymax),
    stat    = "identity") + 
  geom_point() + scale_x_continuous(breaks = 1:10)

ggplot with moving window smoothing

Following the Vignette Extending ggplot2, this is the code I've come up with so far. However, the problem is that this does not show the ribbon. Maybe I need some way to declare that this custom stat is providing the aesthetics ymin and ymax. How do I get the following code to output the similar result as above?

StatMovingWindow <- ggproto("StatMovingWindow", Stat,
  compute_group = function(data, scales, wSize, wStep, fun) {
    moveWin(data, wSize = wSize, wStep = wStep, f = fun)
  },

  required_aes = c("x", "y")
)
stat_movingwindow <- function(mapping = NULL, data = NULL, 
  fun = function(d) quantile(d, probs = c(0.05, 0.50, 0.95), na.rm = TRUE),
  wStep = 0.1, wSize = 1,
  geom = "smooth", position = "identity", show.legend = NA, inherit.aes = TRUE,
  ...
){
  layer(
    stat = StatMovingWindow, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(wStep = wStep, wSize = wSize, fun = fun, ...)
  )
}

ggplot(data = d) + aes(x = x, y = y) +
  stat_movingwindow(wStep = 0.1, wSize = 1) + 
  geom_point() + scale_x_continuous(breaks = 1:10)

custom smoothing stat does not show a ribbon


Solution

  • In your code for stat_movingwindow, the line for the corresponding geom is geom = "smooth":

    stat_movingwindow <- function(mapping = NULL, data = NULL, 
      fun = function(d) quantile(d, probs = c(0.05, 0.50, 0.95), na.rm = TRUE),
      wStep = 0.1, wSize = 1,
      geom = "smooth", # <- look here
      position = "identity", show.legend = NA, inherit.aes = TRUE,
      ...
    ){
      layer(
        stat = StatMovingWindow, data = data, mapping = mapping, geom = geom, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes,
        params = list(wStep = wStep, wSize = wSize, fun = fun, ...)
      )
    }
    

    Checking the code for geom_smooth, we see that it includes the parameter se = TRUE, and uses GeomSmooth as its geom:

    > geom_smooth
    function (mapping = NULL, data = NULL, stat = "smooth", position = "identity", 
        ..., method = "auto", formula = y ~ x, se = TRUE, # <- look here
        na.rm = FALSE, 
        show.legend = NA, inherit.aes = TRUE) 
    {
        params <- list(na.rm = na.rm, se = se, ...)
        if (identical(stat, "smooth")) {
            params$method <- method
            params$formula <- formula
        }
        layer(data = data, mapping = mapping, stat = stat, geom = GeomSmooth, # <- and here
            position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
            params = params)
    }
    

    Digging deeper into GeomSmooth, we see that its draw_group function (which is responsible for plotting the smoothed line) has se = FALSE as its default parameter.

    From the code, if se == FALSE, has_ribbon would be FALSE as well even though both ymax & ymin exists in your data you thanks to the StatMovingWindow$compute_group function. And this in turn means the only the result of GeomLine$draw_panel(path, panel_params, coord) would be returned alone, without GeomRibbon$draw_group(ribbon, panel_params, coord).

    > GeomSmooth$draw_group
    <ggproto method>
      <Wrapper function>
        function (...) 
    f(...)
    
      <Inner function (f)>
        function (data, panel_params, coord, se = FALSE) # <- look here
    {
        ribbon <- transform(data, colour = NA)
        path <- transform(data, alpha = NA)
        has_ribbon <- se && !is.null(data$ymax) && !is.null(data$ymin) # <- and here
        gList(if (has_ribbon) GeomRibbon$draw_group(ribbon, panel_params, coord), 
              GeomLine$draw_panel(path, panel_params, coord))
    }
    

    In short, geom_smooth's default parameter of se = TRUE overrides the default behaviour in GeomSmooth$draw_group, (the same holds for stat_smooth too) and we should do the same in stat_movingwindow if we want to achieve the same result.

    If you expect you'll usually want the ribbon to be plotted, you can include se = TRUE as a parameter in your definition for stat_movingwindow. If it's going to be on an ad hoc basis, you can include it whenever necessary in your code instead.