Search code examples
rregressionlinear-regression

R - how to get to the underlying matrices while using regression in lm?


Imagine you have data like this:

Y <- rnorm(100)
x <- rnorm(100)
z <- rnorm(100)

data <- data.frame(Y, x, z)

To this we can fit a linear model:

enter image description here

fit <- lm(Y ~ x + z, data)

The article states that the regression estimation is done by solving a system of normal equations:

enter image description here

Or in matrix form:

enter image description here

Having fit a lm object, how could I extract the d and S matrices?


Solution

  • We can get X and Y from fm as shown. Then use crossprod to compute XtX and XtY and then solve to get the coefficients.

    fm <- lm(mpg ~ cyl + carb, mtcars)
     
    X <- model.matrix(fm)
    
    Y <- model.response(model.frame(fm))
    # or
    Y <- fitted(fm) + resid(fm)
    
    XtX <- crossprod(X); XtX
    ##             (Intercept)  cyl carb
    ## (Intercept)          32  198   90
    ## cyl                 198 1324  604
    ## carb                 90  604  334
    
    XtY <- crossprod(X,Y); XtY
    ##               [,1]
    ## (Intercept)  642.9
    ## cyl         3693.6
    ## carb        1641.9
    
    c(solve(XtX, XtY))
    ## [1] 37.812739 -2.625023 -0.526146
    
    coef(fm)
    ## (Intercept)         cyl        carb 
    ##   37.812739   -2.625023   -0.526146