Search code examples
rggplot2spline

How to visualize spline regression with ggplot2?


I'm working with the Wage dataset in the ISLR library. My objective is to perform a spline regression with knots at 3 locations (see code below). I can do this regression. That part is fine.

My issue concerns the visualization of the regression curve. Using base R functions, I seem to get the correct curve. But I can't seem to get quite the right curve using the tidyverse. This is what is expected, and what I get with the base functions:

1

This is what ggplot spits out

2

It's noticeably different. R gives me the following message when running the ggplot functions:

geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")

What does this mean and how do I fix it?

library(tidyverse)
library(ISLR)
attach(Wage)

agelims <- range(age)
age.grid <- seq(from = agelims[1], to = agelims[2])

fit <- lm(wage ~ bs(age, knots = c(25, 40, 60), degree = 3), data = Wage) #Default is 3

plot(age, wage, col = 'grey', xlab = 'Age', ylab = 'Wages')
points(age.grid, predict(fit, newdata = list(age = age.grid)), col = 'darkgreen', lwd = 2, type = "l")
abline(v = c(25, 40, 60), lty = 2, col = 'darkgreen')

ggplot(data = Wage) +
  geom_point(mapping = aes(x = age, y = wage), color = 'grey') +
  geom_smooth(mapping = aes(x = age, y = fit$fitted.values), color = 'red')

I also tried

ggplot() +
  geom_point(data = Wage, mapping = aes(x = age, y = wage), color = 'grey') +
  geom_smooth(mapping = aes(x = age.grid, y = predict(fit, newdata = list(age = age.grid))), color = 'red')

but that looks very similar to the 2nd picture.

Thanks for any help!


Solution

  • splines::bs() and s(., type="bs") from mgcv do very different things; the latter is a penalized regression spline. I would try (untested!)

    geom_smooth(method="lm",
       formula=  y ~ splines::bs(x, knots = c(25, 40, 60), degree = 3))