r regression linear-regression lm least-squares

Solving normal equation gives different coefficients from using `lm`?

I wanted to compute a simple regression using the lm and plain matrix algebra. However, my regression coefficients obtained from matrix algebra are only half of those obtained from using the lm and I have no clue why.

Here's the code

boot_example <- data.frame(
  x1 = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
  x2 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L),
  x3 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L),
  x4 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L),
  x5 = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L),
  x6 = c(0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L),
  preference_rating = c(9L, 7L, 5L, 6L, 5L, 6L, 5L, 7L, 6L)
  )
dummy_regression <- lm("preference_rating ~ x1+x2+x3+x4+x5+x6", data = boot_example)
dummy_regression

Call:
lm(formula = "preference_rating ~ x1+x2+x3+x4+x5+x6", data = boot_example)

Coefficients:
(Intercept)           x1           x2           x3           x4           x5           x6  
     4.2222       1.0000      -0.3333       1.0000       0.6667       2.3333       1.3333 

###The same by matrix algebra
X <- matrix(c(
c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), #upper var
c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L), #upper var
c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), #country var
c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), #country var
c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), #price var
c(0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L) #price var
), 
nrow = 9, ncol=6)

Y <- c(9L, 7L, 5L, 6L, 5L, 6L, 5L, 7L, 6L)

#Using standardized (mean=0, std=1) "z" -transformation Z = (X-mean(X))/sd(X) for all predictors
X_std <- apply(X, MARGIN = 2, FUN = function(x){(x-mean(x))/sd(x)})

##If constant shall be computed as well, uncomment next line 
#X_std <- cbind(c(rep(1,9)),X_std)

#using matrix algebra formula
solve(t(X_std) %*% X_std) %*% (t(X_std) %*% Y)

           [,1]
[1,]  0.5000000
[2,] -0.1666667
[3,]  0.5000000
[4,]  0.3333333
[5,]  1.1666667
[6,]  0.6666667

Does anyone see the error in my matrix computation?

Thank you in advance!

Solution

lm is not performing standardization. If you want to obtain the same result by lm, you need:

X1 <- cbind(1, X)  ## include intercept

solve(crossprod(X1), crossprod(X1,Y))

#           [,1]
#[1,]  4.2222222
#[2,]  1.0000000
#[3,] -0.3333333
#[4,]  1.0000000
#[5,]  0.6666667
#[6,]  2.3333333
#[7,]  1.3333333

I don't want to repeat that we should use crossprod. See the "follow-up" part of Ridge regression with glmnet gives different coefficients than what I compute by “textbook definition”?