Search code examples
rregressionlmpolynomialspoly

R: How to plot custom range of polynomial produced by lm poly fit


I'm confused by the coefficients produced by the output of lm

Here's a copy of the data I'm working with

(postprocessed.csv)

"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525

I have fitted a 4th order polynomial to this data using the following script:

library(ggplot2)
library(matrixStats)
library(forecast)

df_input <- read.csv("postprocessed.csv")

x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)

poly4model <- lm(y~poly(x, degree=4), data=df)

v <- seq(30, 40)
vv <- poly4model$coefficients[1] +
  poly4model$coefficients[2] * v +
  poly4model$coefficients[3] * (v ^ 2) +
  poly4model$coefficients[4] * (v ^ 3) +
  poly4model$coefficients[5] * (v ^ 4)

pdf("postprocessed.pdf")
plot(df)
lines(v, vv, col="red", pch=20, lw=3)
dev.off()

I initially tried using the predict function to do this, but couldn't get that to work, so resorted to implementing this "workaround" using some new vectors v and vv to store the data for the line in the region I am trying to plot.

Ultimatly, I am trying to do this:

  • Fit a 4th order polynomial to the data
  • Plot the 4th order polynomial over the range of data in one color
  • Plot the 4th order polynomial over the range from the last value to the last value + 10 (prediction) in a different color

At the moment I am fairly sure using v and vv to do this is not "the best way", however I would have thought it should work. What is happening is that I get very large values.

Here is a screenshot from Desmos. I copied and pasted the same coefficients as shown by typing poly4model$coefficients into the console. However, something must have gone wrong because this function is nothing like the data.

I think I've provided enough info to be able to run this short script. However I will add the pdf as well.

desmos

pdf


Solution

  • It is easiest to use the predict function to create your line. To do that, you pass the model and a data frame with the desired independent variables to the predict function.

    x <- df_input$time
    y <- df_input$value
    df <- data.frame(x, y)
    
    poly4model <- lm(y~poly(x, degree=4), data=df)
    
    v <- seq(30, 40)
    #Notice the column in the dataframe is the same variable name 
    #     as the variable in the model!
    predict(poly4model, data.frame(x=v))
    
    plot(df)
    lines(v, predict(poly4model, data.frame(x=seq(30, 40))), col="red", pch=20, lw=3)
    

    enter image description here

    NOTE
    The function poly "Returns or evaluates orthogonal polynomials of degree 1 to degree over the specified set of points x: these are all orthogonal to the constant polynomial of degree 0." To return the "normal" polynomial coefficients one needs to use the "raw=TRUE" option in the function.

    poly4model <- lm(y~poly(x, degree=4, raw=TRUE), data=df)
    

    Now your equation above will work.