r matrix statistics gradient matrix-multiplication

Why do these two approaches to calculating the MSE gradient in R not give the same result?

I'm experimenting with approaches to speeding up the calculation of the MSE gradient function. I thought a good way to save time would be to precalculate some of the matrix operations, which I've attempted in the function mse_grad_2 below. However, when I compare to a naive function the results differ (only by a very small amount, but I want to figure out why they aren't exactly the same).

set.seed(1)
y <- rnorm(10)
X <- matrix(rnorm(10*15), ncol=15, nrow=10)
beta <- rnorm(15)

mse_grad <- function(y, X, input, n) {
  r <- -(1/n)*t(X) %*% (y - X %*% input)
  return(r)
}

mse_grad_2 <- function(XTY, XTX, input, n) { 
  r <- -(1/n)*(XTY - XTX %*% input)
  return(r)
}

mean(mse_grad(y, X, beta, 10) - mse_grad_2(t(X) %*% y, t(X) %*% X, beta, 10))
# [1] 4.810966e-17

Solution

This is due to finite-precision arithmetic issues (see e.g. https://en.wikipedia.org/wiki/Floating-point_arithmetic). For the same reasons

0.3 == 0.2 + 0.1
[1] FALSE

See below:

options(digits = 22)
0.3
[1] 0.2999999999999999888978
0.2
[1] 0.2000000000000000111022
0.1
[1] 0.1000000000000000055511
0.2 + 0.1
[1] 0.3000000000000000444089