Search code examples
rmodelregressionpolynomial-approximations

Why is the lm function giving absurdly high results?


First off, I'll give you some reproducible code:

library(ggplot2)

y = c(0, 0, 1, 2, 0,  0, 1,  3,  0,  0,  3, 0, 6, 2, 8, 16, 21, 39, 48, 113, 92, 93 ,127, 159, 137, 46, 238, 132 ,124, 185 ,171, 250, 250 ,187, 119 ,151, 292,  94, 281, 146, 163 ,104, 156, 272, 273, 212, 210, 135, 187, 208, 310, 276 ,235, 246, 190, 232, 254, 446,
314, 402 ,276, 279, 386 ,402, 238, 581, 434, 159, 261, 356, 440, 498, 495, 462 ,306, 233, 396, 331, 418, 293 ,431 ,300, 222, 222, 479 ,501, 702
,790, 681)
x = 1:length(y)

Now, I'm trying to construct a 3rd-degree polynomial regression curve for this dataset. I wanted to know the coefficients of this model, by summary(lm(formula=y~poly(x,3))). I get an absurd result back.

Call:
lm(formula = y ~ poly(x, 3))

Residuals:
     Min       1Q   Median       3Q      Max 
-253.696  -47.582   -9.709   44.314  271.183 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  223.978      9.703  23.083   <2e-16 ***
poly(x, 3)1 1420.644     91.538  15.520   <2e-16 ***
poly(x, 3)2   62.375     91.538   0.681    0.497    
poly(x, 3)3  130.161     91.538   1.422    0.159    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 91.54 on 85 degrees of freedom
Multiple R-squared:  0.7411,    Adjusted R-squared:  0.732 
F-statistic: 81.12 on 3 and 85 DF,  p-value: < 2.2e-16

This is absurdly high for my model, and I'm confused as to why this output is getting returned.

Why is this happening? Where am I going wrong?


Solution

  • I think what you want is:

    lm(y ~ poly(x, 3, raw = TRUE))
    

    I hope this helps!