I'm working with the Wage dataset in the ISLR library. My objective is to perform a spline regression with knots at 3 locations (see code below). I can do this regression. That part is fine.
My issue concerns the visualization of the regression curve. Using base R functions, I seem to get the correct curve. But I can't seem to get quite the right curve using the tidyverse. This is what is expected, and what I get with the base functions:
This is what ggplot spits out
It's noticeably different. R gives me the following message when running the ggplot functions:
geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")
What does this mean and how do I fix it?
library(tidyverse)
library(ISLR)
attach(Wage)
agelims <- range(age)
age.grid <- seq(from = agelims[1], to = agelims[2])
fit <- lm(wage ~ bs(age, knots = c(25, 40, 60), degree = 3), data = Wage) #Default is 3
plot(age, wage, col = 'grey', xlab = 'Age', ylab = 'Wages')
points(age.grid, predict(fit, newdata = list(age = age.grid)), col = 'darkgreen', lwd = 2, type = "l")
abline(v = c(25, 40, 60), lty = 2, col = 'darkgreen')
ggplot(data = Wage) +
geom_point(mapping = aes(x = age, y = wage), color = 'grey') +
geom_smooth(mapping = aes(x = age, y = fit$fitted.values), color = 'red')
I also tried
ggplot() +
geom_point(data = Wage, mapping = aes(x = age, y = wage), color = 'grey') +
geom_smooth(mapping = aes(x = age.grid, y = predict(fit, newdata = list(age = age.grid))), color = 'red')
but that looks very similar to the 2nd picture.
Thanks for any help!
splines::bs()
and s(., type="bs")
from mgcv
do very different things; the latter is a penalized regression spline. I would try (untested!)
geom_smooth(method="lm",
formula= y ~ splines::bs(x, knots = c(25, 40, 60), degree = 3))