Search code examples
rregressionmass

Why do MASS:lm.ridge coefficents differ from those calculated manually?


When performing ridge regression manually, as it is defined

solve(t(X) %*% X + lbd*I) %*%t(X) %*% y

I get different results from those calculated by MASS::lm.ridge. Why? For ordinary linear regression the manual method (computing the pseudoinverse) works fine.

Here is my Minimal, Reproducible Example:

library(tidyverse)

ridgeRegression = function(X, y, lbd) {
  Rinv = solve(t(X) %*% X + lbd*diag(ncol(X)))
  t(Rinv %*% t(X) %*% y)
}

# generate some data:
set.seed(0)
tb1 = tibble(
  x0 = 1,
  x1 = seq(-1, 1, by=.01),
  x2 = x1 + rnorm(length(x1), 0, .1),
  y  = x1 + x2 + rnorm(length(x1), 0, .5)
)
X = as.matrix(tb1 %>% select(x0, x1, x2))

# sanity check: force ordinary linear regression
# and compare it with the built-in linear regression:
ridgeRegression(X, tb1$y, 0) - coef(summary(lm(y ~ x1 + x2, data=tb1)))[, 1]
# looks the same: -2.94903e-17 1.487699e-14 -2.176037e-14

# compare manual ridge regression to MASS ridge regression:
ridgeRegression(X, tb1$y, 10) - coef(MASS::lm.ridge(y ~ x0 + x1 + x2 - 1, data=tb1, lambda = 10))
# noticeably different: -0.0001407148 0.003689412 -0.08905392

Solution

  • MASS::lm.ridge scales the data before modelling - this accounts for the difference in the coefficients.

    You can confirm this by checking the function code by typing MASS::lm.ridge into the R console.

    Here is the lm.ridge function with the scaling portion commented out:

    X = as.matrix(tb1 %>% select(x0, x1, x2))
    n <- nrow(X); p <- ncol(X)
    #Xscale <- drop(rep(1/n, n) %*% X^2)^0.5
    #X <- X/rep(Xscale, rep(n, p))
    Xs <- svd(X)
    rhs <- t(Xs$u) %*% tb1$y
    d <- Xs$d
    lscoef <-  Xs$v %*% (rhs/d)
    lsfit <- X %*% lscoef
    resid <- tb1$y - lsfit
    s2 <- sum(resid^2)/(n - p)
    HKB <- (p-2)*s2/sum(lscoef^2)
    LW <- (p-2)*s2*n/sum(lsfit^2)
    k <- 1
    dx <- length(d)
    div <- d^2 + rep(10, rep(dx,k))
    a <- drop(d*rhs)/div
    dim(a) <- c(dx, k)
    coef <- Xs$v %*% a
    coef
    #             x0        x1        x2
    #[1,] 0.01384984 0.8667353 0.9452382