Search code examples
rggplot2gammgcv

ggplot confidence bands from gam predict$fit and predict$se.fit


I have the following variables:

prod: positive integer

tenure: positive numeric

cohort: factor

Here is some simulated data with these specifications.

set.seed(123)
my_data <- data.frame(prod   = rnbinom(10000, mu = 2.5, size = 1.5),
                      tenure = rexp(10000),
                      cohort = factor(sample(2011:2014, size = 10000, replace = TRUE,
                                             prob = c(0.17, 0.49, 0.26, 0.08))))

I have fit the following model using mgcv:gam:

library(mgcv)
mod <- gam(prod ~ s(tenure, by = cohort) + cohort, data = my_data, family = nb())

The get the predictions and their standard errors:

preds   <- predict(mod, se.fit = TRUE)
my_data <- data.frame(my_data,
                      mu   = exp(preds$fit),
                      low  = exp(preds$fit - 1.96 * preds$se.fit),
                      high = exp(preds$fit + 1.96 * preds$se.fit))

It is fairly straightforward to use package:ggplot2 to acquire the smoothed predictions mu for each cohort (while also forcing the smoother to have positive values):

library(magrittr)
library(ggplot2)
library(splines)
my_plot <-
  ggplot(my_data, aes(x = tenure, y = mu, color = cohort)) %>%
  + geom_smooth(method  = "glm",
                formula = y ~ ns(x, 3),
                family  = "quasipoisson",
                fill    = NA)

But I would like to have smoothed confidence bands from the GAM. How do I add those?

Not the answer

  1. Remove fill = NA. Nope. Those confidence bands would be infinitely small because the prediction by tenure is exactly the same within a cohort.
  2. Add a call to geom_ribbon(aes(x = tenure, ymin = low, ymax = high)). Nope. That gives me a super-wiggly, non-smoothed confidence band.
  3. Use package:ggvis! No package:ggvis answers, please, unless there is no way to do this in ggplot2. My current plotting framework is ggplot2, and I'm sticking with it for now unless I must switch in order to do this plot.

Solution

  • This worked for me.

    require(ggplot2)
    require(mgcv)
    
    set.seed(123)
    my_data <- data.frame(prod   = rnbinom(10000, mu = 2.5, size = 1.5),
                          tenure = rexp(10000),
                          cohort = factor(sample(2011:2014, size = 10000, replace = TRUE,
                                                 prob = c(0.17, 0.49, 0.26, 0.08))))
    mod <- gam(prod ~ s(tenure, by = cohort) + cohort, data = my_data, family = nb())
    preds   <- predict(mod, se.fit = TRUE)
    my_data <- data.frame(my_data,
                          mu   = exp(preds$fit),
                          low  = exp(preds$fit - 1.96 * preds$se.fit),
                          high = exp(preds$fit + 1.96 * preds$se.fit))
    
    ggplot(my_data, aes(x = tenure, y = prod, color = cohort)) +
      geom_point() + 
      geom_smooth(aes(ymin = low, ymax = high, y = mu), stat = "identity")
    

    enter image description here