Search code examples

rcs generates bad prediction in lm() models

I'm trying to reproduce this blog post on overfitting. I want to explore how a spline compares to the tested polynomials.

My problem: Using the rcs() - restricted cubic splines - from the rms package I get very strange predictions when applying in regular lm(). The ols() works fine but I'm a little surprised by this strange behavior. Can someone explain to me what's happening?

p4 <- poly(1:100, degree=4)
true4 <- p4 %*% c(1,2,-6,9)
days <- 1:70

noise4 <- true4 + rnorm(100, sd=.5)
reg.n4.4 <- lm(noise4[1:70] ~ poly(days, 4))
reg.n4.4ns <- lm(noise4[1:70] ~ ns(days,5))
reg.n4.4rcs <- lm(noise4[1:70] ~ rcs(days,5))
dd <- datadist(noise4[1:70], days)
options("datadist" = "dd")
reg.n4.4rcs_ols <- ols(noise4[1:70] ~ rcs(days,5))

plot(1:100, noise4)
nd <- data.frame(days=1:100)
lines(1:100, predict(reg.n4.4, newdata=nd), col="orange", lwd=3)
lines(1:100, predict(reg.n4.4ns, newdata=nd), col="red", lwd=3)
lines(1:100, predict(reg.n4.4rcs, newdata=nd), col="darkblue", lwd=3)
lines(1:100, predict(reg.n4.4rcs_ols, newdata=nd), col="grey", lwd=3)

legend("top", fill=c("orange", "red", "darkblue", "grey"), 
       legend=c("Poly", "Natural splines", "RCS - lm", "RCS - ols"))

As you can see the darkblue is allover the place...

The plot


  • You can use rcs() with non-rms fitters as long as you specify the knots. predict defaults to predict.ols for an ols object, which is nice because it "remembers" where it put the knots when it fit the model. predict.lm does not have that functionality, so it uses the distribution of the new data set to determine the placement of the knots, rather than the distribution of the training data.