Search code examples
rmatrixstatisticsgradientmatrix-multiplication

Why do these two approaches to calculating the MSE gradient in R not give the same result?


I'm experimenting with approaches to speeding up the calculation of the MSE gradient function. I thought a good way to save time would be to precalculate some of the matrix operations, which I've attempted in the function mse_grad_2 below. However, when I compare to a naive function the results differ (only by a very small amount, but I want to figure out why they aren't exactly the same).

set.seed(1)
y <- rnorm(10)
X <- matrix(rnorm(10*15), ncol=15, nrow=10)
beta <- rnorm(15)

mse_grad <- function(y, X, input, n) {
  r <- -(1/n)*t(X) %*% (y - X %*% input)
  return(r)
}

mse_grad_2 <- function(XTY, XTX, input, n) { 
  r <- -(1/n)*(XTY - XTX %*% input)
  return(r)
}

mean(mse_grad(y, X, beta, 10) - mse_grad_2(t(X) %*% y, t(X) %*% X, beta, 10))
# [1] 4.810966e-17

Solution

  • This is due to finite-precision arithmetic issues (see e.g. https://en.wikipedia.org/wiki/Floating-point_arithmetic). For the same reasons

    0.3 == 0.2 + 0.1
    [1] FALSE
    

    See below:

    options(digits = 22)
    0.3
    [1] 0.2999999999999999888978
    0.2
    [1] 0.2000000000000000111022
    0.1
    [1] 0.1000000000000000055511
    0.2 + 0.1
    [1] 0.3000000000000000444089