I'm experimenting with approaches to speeding up the calculation of the MSE gradient function. I thought a good way to save time would be to precalculate some of the matrix operations, which I've attempted in the function mse_grad_2
below. However, when I compare to a naive function the results differ (only by a very small amount, but I want to figure out why they aren't exactly the same).
set.seed(1)
y <- rnorm(10)
X <- matrix(rnorm(10*15), ncol=15, nrow=10)
beta <- rnorm(15)
mse_grad <- function(y, X, input, n) {
r <- -(1/n)*t(X) %*% (y - X %*% input)
return(r)
}
mse_grad_2 <- function(XTY, XTX, input, n) {
r <- -(1/n)*(XTY - XTX %*% input)
return(r)
}
mean(mse_grad(y, X, beta, 10) - mse_grad_2(t(X) %*% y, t(X) %*% X, beta, 10))
# [1] 4.810966e-17
This is due to finite-precision arithmetic issues (see e.g. https://en.wikipedia.org/wiki/Floating-point_arithmetic). For the same reasons
0.3 == 0.2 + 0.1
[1] FALSE
See below:
options(digits = 22)
0.3
[1] 0.2999999999999999888978
0.2
[1] 0.2000000000000000111022
0.1
[1] 0.1000000000000000055511
0.2 + 0.1
[1] 0.3000000000000000444089