Search code examples
rmodelinglmdplyrpredict

Identical quadratic and cubic predictions


For the following data:

require(dplyr)
require(ggplot2)

ds <- read.table(header = TRUE, text ="
obs  id year attend
                 1  47 2000      1
                 2  47 2001      3
                 3  47 2002      5
                 4  47 2003      8
                 5  47 2004      6
                 6  47 2005      4
                 7  47 2006      2
                 8  47 2007      1
                 9  47 2008      2
                 10 47 2009      3
                 11 47 2010      4
                 12 47 2011      5

                 ")
print(ds)

I would like to compute predicted values of linear models

linear<-    predict(lm(attend ~ year, ds))
quadratic<- predict(lm(attend ~ year + I(year^2),ds))
cubic<-     predict(lm(attend ~ year + I(year^2) + I(year^3),ds))

ds<- ds %>% dplyr::mutate(linear=linear, quadratic=quadratic, cubic=cubic)
print(ds)

   obs id year attend   linear quadratic    cubic
1    1 47 2000      1 3.820513  3.500000 3.500000
2    2 47 2001      3 3.792541  3.646853 3.646853
3    3 47 2002      5 3.764569  3.758741 3.758741
4    4 47 2003      8 3.736597  3.835664 3.835664
5    5 47 2004      6 3.708625  3.877622 3.877622
6    6 47 2005      4 3.680653  3.884615 3.884615
7    7 47 2006      2 3.652681  3.856643 3.856643
8    8 47 2007      1 3.624709  3.793706 3.793706
9    9 47 2008      2 3.596737  3.695804 3.695804
10  10 47 2009      3 3.568765  3.562937 3.562937
11  11 47 2010      4 3.540793  3.395105 3.395105
12  12 47 2011      5 3.512821  3.192308 3.192308

Question: Despite the fact that time series has a clear cubical shape, the quadratic and cubic predictions are identical. Why? Is this a mistake?


Solution

  • This is due to the fact 2011^3 is a very big number (greater tha and this is causing the coeffiicent to be returned as NA. If you had inspected the models, you would have noticed this.

    coef(lm(attend ~ year + I(year^2) + I(year^3),ds))
    #   (Intercept)          year     I(year^2)     I(year^3) 
    # -7.025524e+04  7.009441e+01 -1.748252e-02            NA 
    

    It is more sensible to use poly to create orthogonal polynomials

    linear<-    predict(lm(attend ~ year, ds))
    quadratic<- predict(lm(attend ~ poly(year,2),ds))
    cubic<-     predict(lm(attend ~ poly(year,3),ds))
    
    
    ds<- (ds %>% dplyr::mutate(linear=linear, quadratic=quadratic, cubic=cubic))
    ds
    # obs id year attend   linear quadratic     cubic
    # 1    1 47 2000      1 3.820513  3.500000 0.7435897
    # 2    2 47 2001      3 3.792541  3.646853 3.8974359
    # 3    3 47 2002      5 3.764569  3.758741 5.5128205
    # 4    4 47 2003      8 3.736597  3.835664 5.9238539
    # 5    5 47 2004      6 3.708625  3.877622 5.4646465
    # 6    6 47 2005      4 3.680653  3.884615 4.4693085
    # 7    7 47 2006      2 3.652681  3.856643 3.2719503
    # 8    8 47 2007      1 3.624709  3.793706 2.2066822
    # 9    9 47 2008      2 3.596737  3.695804 1.6076146
    # 10  10 47 2009      3 3.568765  3.562937 1.8088578
    # 11  11 47 2010      4 3.540793  3.395105 3.1445221
    # 12  12 47 2011      5 3.512821  3.192308 5.9487179