Search code examples
rdata-analysispolynomial-mathpolynomials

Polynomial model to data in R


Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015)

Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,632,720,814,920,1044,1186)

How can I find the population for the years: 1925, 1963, 1978, 1988, 1998 using Polynomial Linear Regression.


Solution

  • Here is a starting point for the solution of your problem.

    Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,
              1970,1975,1980,1985,1990,1995,2000,2005,2010,2015)
    Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,
                632,720,814,920,1044,1186)
    df <- data.frame(Year, Africa)
    
    # Polynomial linear regression of order 5
    model1 <- lm(Africa ~ poly(Year,5), data=df)
    summary(model1)
    
    ###########
    Call:
    lm(formula = Africa ~ poly(Year, 5), data = df)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -59.639 -27.119 -12.397   9.149  97.398 
    
    Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)      411.32      10.12  40.643  < 2e-16 ***
    poly(Year, 5)1   881.26      47.47  18.565 3.01e-12 ***
    poly(Year, 5)2   768.50      47.47  16.190 2.42e-11 ***
    poly(Year, 5)3   709.43      47.47  14.945 8.07e-11 ***
    poly(Year, 5)4   628.45      47.47  13.239 4.89e-10 ***
    poly(Year, 5)5   359.04      47.47   7.564 1.14e-06 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 47.47 on 16 degrees of freedom
    Multiple R-squared:  0.9852,    Adjusted R-squared:  0.9805 
    F-statistic: 212.5 on 5 and 16 DF,  p-value: 4.859e-14
    #############
    
    pred <- predict(model1)
    plot(Year, Africa, type="o", xlab="Year", ylab="Africa")
    lines(Year, pred, lwd=2, col="red")
    

    enter image description here

    The model estimated above shows a bad fit for Years < 1900. It is therefore preferable to estimate a model only for data after 1900.

    # Polynomial linear regression of order 2
    df2 <- subset(df,Year>1900)
    model2 <- lm(Africa ~ poly(Year,2), data=df2)
    summary(model2)
    
    ###########
    Call:
    lm(formula = Africa ~ poly(Year, 2), data = df2)
    
    Residuals:
       Min     1Q Median     3Q    Max 
    -9.267 -2.489 -0.011  3.334 12.482 
    
    Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)     586.857      1.677  349.93  < 2e-16 ***
    poly(Year, 2)1 1086.646      6.275  173.17  < 2e-16 ***
    poly(Year, 2)2  245.687      6.275   39.15 3.65e-13 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 6.275 on 11 degrees of freedom
    Multiple R-squared:  0.9997,    Adjusted R-squared:  0.9996 
    F-statistic: 1.576e+04 on 2 and 11 DF,  p-value: < 2.2e-16
    ###########
    
    df2$pred <- predict(model2)
    plot(df2$Year, df2$Africa, type="o", xlab="Year", ylab="Africa")
    lines(df2$Year, df2$pred, lwd=2, col="red")
    

    The fit of this second model is clearly better:

    enter image description here

    At last, we get model prediction for the years 1925, 1963, 1978, 1988, 1998.

    df3 <- data.frame(Year=c(1925, 1963, 1978, 1988, 1998))
    df3$pred <- predict(model2, newdata=df3)
    df3
    
      Year     pred
    1 1925 286.4863
    2 1963 301.1507
    3 1978 451.7210
    4 1988 597.6301
    5 1998 779.9623