I have the following variables:
prod
: positive integer
tenure
: positive numeric
cohort
: factor
Here is some simulated data with these specifications.
set.seed(123)
my_data <- data.frame(prod = rnbinom(10000, mu = 2.5, size = 1.5),
tenure = rexp(10000),
cohort = factor(sample(2011:2014, size = 10000, replace = TRUE,
prob = c(0.17, 0.49, 0.26, 0.08))))
I have fit the following model using mgcv:gam
:
library(mgcv)
mod <- gam(prod ~ s(tenure, by = cohort) + cohort, data = my_data, family = nb())
The get the predictions and their standard errors:
preds <- predict(mod, se.fit = TRUE)
my_data <- data.frame(my_data,
mu = exp(preds$fit),
low = exp(preds$fit - 1.96 * preds$se.fit),
high = exp(preds$fit + 1.96 * preds$se.fit))
It is fairly straightforward to use package:ggplot2
to acquire the smoothed predictions mu
for each cohort (while also forcing the smoother to have positive values):
library(magrittr)
library(ggplot2)
library(splines)
my_plot <-
ggplot(my_data, aes(x = tenure, y = mu, color = cohort)) %>%
+ geom_smooth(method = "glm",
formula = y ~ ns(x, 3),
family = "quasipoisson",
fill = NA)
But I would like to have smoothed confidence bands from the GAM. How do I add those?
Not the answer
fill = NA
. Nope. Those confidence bands would be infinitely small because the prediction by tenure is exactly the same within a cohort.geom_ribbon(aes(x = tenure, ymin = low, ymax = high))
. Nope. That gives me a super-wiggly, non-smoothed confidence band.package:ggvis
! No package:ggvis
answers, please, unless there is no way to do this in ggplot2
. My current plotting framework is ggplot2
, and I'm sticking with it for now unless I must switch in order to do this plot.This worked for me.
require(ggplot2)
require(mgcv)
set.seed(123)
my_data <- data.frame(prod = rnbinom(10000, mu = 2.5, size = 1.5),
tenure = rexp(10000),
cohort = factor(sample(2011:2014, size = 10000, replace = TRUE,
prob = c(0.17, 0.49, 0.26, 0.08))))
mod <- gam(prod ~ s(tenure, by = cohort) + cohort, data = my_data, family = nb())
preds <- predict(mod, se.fit = TRUE)
my_data <- data.frame(my_data,
mu = exp(preds$fit),
low = exp(preds$fit - 1.96 * preds$se.fit),
high = exp(preds$fit + 1.96 * preds$se.fit))
ggplot(my_data, aes(x = tenure, y = prod, color = cohort)) +
geom_point() +
geom_smooth(aes(ymin = low, ymax = high, y = mu), stat = "identity")