Search code examples
rggplot2lmsmoothingfacet-wrap

How can I calculate an entire family of smooths using geom_smooth() and display each one using facet_wrap()?


In the documentation for geom_smooth(), there is an example that shows how to fit a B-spline smooth to the hwy vs. displ columns of the tidyverse mpg dataset, using a parameter setting for the bs() function of df=3:

geom_smooth() usage example

I'd like to repeat the same example, but instead of computing just a single smooth with a single setting for the df parameter, I'd like to use a range of df values (for example, 3, 5, 7, 9) to calculate a series of smooths, and then display each smooth in a separate panel using facet_wrap() (and also as a minor addition, I furthermore want to display the gray-shaded confidence interval around the smooth curve). However, I can't quite figure out what syntax I should use, or indeed whether ggplot2 even has the flexibility to support a computation such as this directly inside of geom_smooth().

I've posted a MWE below:

library(tidyverse)
library(splines)

# ---- Preface with optional additional problem context ----

# This fits 4 different B-splines to the "hwy" vs. "displ" columns of the 
# tidyverse "mpg" tibble, with the bs() df parameter set to c(3, 5, 7, 9).
# This is essentially representative of the kind of result I want, except
# that instead of computing it externally and saving the result to a list
# as I've done here, I want to do it automatically inside of geom_smooth().
fitobj <- list()
for(ii in seq(3,9,2)) {
  fitobj[[as.character(ii)]] <- lm(formula = hwy ~ bs(displ, df=ii), data=mpg)
}

# ---- MWE really starts here ----

# Make 4 identical copies of the "mpg" tibble, with an extra column tacked
# onto the right containing values 3, 5, 7, 9
mpg_rep <- NULL
for(ii in seq(3,9,2)) {
    tbl <- mpg
    tbl$splinedf <- ii
    mpg_rep <- bind_rows(mpg_rep, tbl)  
}

# Make a baseline plot; smooths will be appended afterward
plt <- ggplot(mpg_rep, aes(x=displ, y=hwy, group=splinedf)) +
       geom_point() +
       facet_wrap(~splinedf)

# This does _almost_ what I want, except that instead of plotting a different
# smooth in each panel, it plots the same smooth four times redundantly
print(plt + geom_smooth(method = lm, formula = y ~ bs(x, df=3)))

# This looks like it has sort of the right syntax to do what I want, however
# it returns an error message; I guess perhaps because I'm not allowed to
# reference an aesthetic like this inside a formula?
print(plt + geom_smooth(method = lm, formula = y ~ bs(x, df=splinedf)))

and this is an example output that looks almost like what I want, except that I want 4 different smooths instead of the same smooth 4 times:

example output showing one smooth repeated 4 times instead of 4 unique smooths

How can I revise the MWE to get it to do exactly what I want?


Solution

  • You could lapply() smooth layers to add to the plot, whilst simultaneously providing new facet variables.

    library(ggplot2)
    
    ggplot(mpg, aes(displ, hwy)) +
      geom_point() +
      lapply(c(3,5,7,9), function(i) {
        geom_smooth(
          data = ~ cbind(., facet = i),
          method = lm,
          formula = y ~ splines::bs(x, i)
        )
      }) +
      facet_wrap(vars(facet))
    

    Created on 2021-04-21 by the reprex package (v1.0.0)