Search code examples
rlmpoly

How is the fitted.values from lm function calculated?


I'm adjusting a response surface using the lm function on R. On output, using fit[["fitted.values"]], the reported values are consistent with the actual values. However, when I use the coefficients generated for each effect (Intercept, x1, x1*x2) and do the manual calculation, the values presented are very discrepant.

How does the fitted.values function do its calculations?

Below, part of the code and outputs:

data = read.csv("data.csv", header = TRUE, sep = ";")
head(data)
fit <- lm(rend ~ poly(wis, enz, degree = 2), data = data)
summary(fit)

The output for head(data):

    wis enz rend
1 10.00   3   68
2 10.25   3   66
3 10.50   3   64
4 10.75   3   62
5 11.00   3   61
6 11.25   3   60

Part of the output of fit[["fitted.values"]]:

      1        2        3        4        5        6        7        8        9       10       11       12 
67.02258 65.46832 64.01733 62.66962 61.42517 60.28399 59.24609 58.31146 57.48009 56.75200 56.12718 55.60564 
      13       14       15       16       17       18       19       20       21       22       23       24 
55.18736 54.87235 54.66062 54.55215 54.54696 54.64504 54.84639 55.15101 55.55891 56.07007 56.68450 57.40221

And the summary coeficients:

Coefficients:
                                Estimate Std. Error  t value Pr(>|t|)    
(Intercept)                     66.23929    0.02293 2888.536   <2e-16 ***
poly(wis, enz, degree = 2)1.0  -24.26976    0.70940  -34.211   <2e-16 ***
poly(wis, enz, degree = 2)2.0  129.35949    0.70940  182.349   <2e-16 ***
poly(wis, enz, degree = 2)0.1  130.19258    0.70940  183.524   <2e-16 ***
poly(wis, enz, degree = 2)1.1 -701.60447   21.94572  -31.970   <2e-16 ***
poly(wis, enz, degree = 2)0.2    0.48360    0.70940    0.682    0.496 

What I did was apply these coefficients as follows, considering only the significant ones:

rend = 66.24 - 24.26*wis + 129.36*wis^2 + 130.19*enz - 701.60*wis*enz

As an example, using the first values of the output of head(data):

rend = 66.24 - 24.26*10 + 129.36*10^2 + 130.19*3 - 701.60*10*3

Which results in 7897.79 which is different from estimated by fitted.values which was 67.02

What am I doing wrong?


Solution

  • The poly() function constructs an orthogonal polynomial basis by default, which is very different from the standard b0 + b1*x1 + b2*x2 + b12*x1*x2 ... parameterization. If you want the standard parameterization you can use poly(wiz, env, degree = 2, raw = TRUE).

    If you want to see what poly() is doing by default you can use model.matrix(fit) to see what's in the X matrix (in the sense of a linear model defined as y = X %*% beta) (the actual mathematical details/definition of how the polynomials are defined is a little deep: What does the R function `poly` really do? )