Search code examples
rregressionleast-squares

Different results when lm() is used vs. matrix multiplication formula


I am running a simple multivariate regression on a panel/time-series dataset, using lm() and the underlying formula $(X'X)^{-1} X'Y$

I'm expecting to get the same coefficient values from the two methods. However, I get completely different estimates.

Here is the R code:

  return = matrix(ret.ff.zoo, ncol = 50)  # y vector
  data = cbind(df$EQ, df$EFF, df$SIZE, df$MOM, df$MSCR, df$SY, df$UMP)   # x vector
  
  #First method     
  BETA = solve(crossprod(data)) %*% crossprod(data, return)
  
  #Second method
  OLS <- lm(return ~ data)

I am not sure why the estimates are different between the two methods.


Solution

  • Your example isn't reproducible, but if you try it with some dummy data, the matrix formula and lm produce the same results when you take out the intercept:

    set.seed(1)
    
    x <- matrix(rnorm(1000),ncol=5)
    y <- rnorm(200)
    
    solve(t(x) %*% x) %*% t(x) %*% y
                  [,1]
    [1,] -0.0826496646
    [2,] -0.0165735273
    [3,] -0.0009412659
    [4,]  0.0070475728
    [5,] -0.0642452777
    > lm(y ~ x + 0)
    
    Call:
    lm(formula = y ~ x + 0)
    
    Coefficients:
            x1          x2          x3          x4          x5  
    -0.0826497  -0.0165735  -0.0009413   0.0070476  -0.0642453