Search code examples
rggplot2gam

set max number of inflection points using geom_smooth


I am creating very basic plots to visualize trends in a dataset. I'm using the gam smoother with geom_smooth, but this is over-fitting the data at several sites. For example, in the very first facet, the smoother over-fits the red data points.

Is there a way to adjust my code to set a max # of inflection points to focus on the overarching trends in the dataset (e.g., max. 2 or 3 inflection points).

ggplot() +
  stat_summary(fun = "mean", 
               data = df,
               aes(x = year, y = morpho, group = interaction(year, group), col = group),
               shape = 1, alpha = 0.4) +
  geom_smooth(method = "gam", se = TRUE, 
              formula = y ~ s(x, bs = "cs"),
              data = df,
              aes(x = year, y = morpho, group = group, col = group)) +
  facet_grid(. ~ site) +
  theme_bw() +
  theme(legend.position = "bottom",
        legend.direction = "horizontal",
        axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))

enter image description here

EDIT (2022-12-19): following some of the comments raised by @GavinSimpson, I've tried to make some visuals of how changing k changes the shape of geom_smooth curve.

For these plots, I've tried to create an example dataset that has a rough quadratic section followed by a roughly linear section. Ideally, I want to find a k value that would capture the inflection point at the 'peak' of the quadratic section and the transition from quadratic to linear (x ~= 5 & 10).

library(ggpubr)
library(ggplot)

df <- data.frame(
  x1 = c(1:19),
  y1 = c(1.1, 4, 6.9, 8.5, 10, 7.2, 3.7, 2.2, 1, #quadratic
         0.4, 1.6, 2.5, 3.7, 4.6, 5.8, 6.7, 7.9, 8.8, 10)) #linear

pk <- ggplot(data = df,
             aes(y = y1, x = x1)) +
  geom_point(col = "red") +
  theme_bw()

# vary k from 2 to not specified
ggarrange(nrow = 3, ncol = 2,
          pk + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs", k = 2)) + ggtitle("k=2"),
          pk + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs", k = 3)) + ggtitle("k=3"),
          pk + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs", k = 4)) + ggtitle("k=4"),
          pk + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs", k = 5)) + ggtitle("k=5"),
          pk + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs", k = 6)) + ggtitle("k=6"),
          pk + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs")) + ggtitle("k unconstrained"))

enter image description here

The first plot throws an error message: Warning message: In smooth.construct.cr.smooth.spec(object, data, knots) : basis dimension, k, increased to minimum possible Indicating that R is increasing the k from 2 to 3.

Only when k = 5 do we start to see the smoother begin to capture the two main inflection points in the data (x ~= 5 & 10). But what does k = 5 mean? It also seems that it's added another slight inflection point around x ~= 16.

At k = 6, the SE is reduced and the line comes even closer to each data point, and it looks nearly identical to the figure when k is not specified at all, which might be ok in this situation, but for my dataset, when k isn't specified there is a lot of over-fitting when I really just want to see the most important inflection points.

So still no clear answer on how to specify inflection points.


Solution

  • Thanks to the suggestion from @tpetzoldt, I was able to identify how to set the max. number of inflection points in the gam formula.

    ggplot() +
      stat_summary(fun = "mean", 
                   data = df,
                   aes(x = year, y = morpho, group = interaction(year, group), col = group),
                   shape = 1, alpha = 0.4) +
      geom_smooth(method = "gam", se = TRUE, 
                  formula = y ~ s(x, bs = "cs", k = 2), #k = number of inflection points
                  data = df,
                  aes(x = year, y = morpho, group = group, col = group)) +
      facet_grid(. ~ site) +
      theme_bw() +
      theme(legend.position = "bottom",
            legend.direction = "horizontal",
            axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))