Search code examples
rregressionpredictionforecasting

Predict Future values using polynomial regression in R


Was trying to predict the future value of a sample using polynomial regression in R. The y values within the sample forms a wave pattern. For example

x = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
y= 1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4

But when the graph is plotted for future values the resultant y values was completely different from what was expected. Instead of a wave pattern, was getting a graph where the y values keep increasing.

futurY = 17,18,19,20,21,22

Tried different degrees of polynomial regression, but the predicted results for futurY were drastically different from what was expected

Following is the sample R code which was used to get the results

dfram <- data.frame('x'=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
dfram$y <- c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4)
plot(dfram,dfram$y,type="l", lwd=3)
pred <- data.frame('x'=c(17,18,19,20,21,22))
myFit <- lm(y ~ poly(x,5), data=dfram)
newdata <- predict(myFit, pred)
print(newdata)
plot(pred[,1],data.frame(newdata)[,1],type="l",col="red", lwd=3)

Is this the correct technique to be used for predicting the unknown future y values OR should I be using other techniques like forecasting?


Solution

  • # Reproducing your data frame
    dfram <- data.frame("x" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),
                        "y" = c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4))
    

    From your graph I've got the phase and period of the signal. There're better ways of calculating that automatically.

    # Phase and period
    fase = 1
    per = 10
    

    In the linear model function I've put the triangular signal equations.

    fit <- lm(y ~ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * (x-fase)%%(per/2))
                + I((((trunc((x-fase)/(per/2))%%2)*2)-1) * ((per/2)-((x-fase)%%(per/2))))
              ,data=dfram)
    
    # Predict the old data
    p_olddata <- predict(fit,type="response")
    
    # Predict the new data
    newdata <- data.frame('x'=c(17,18,19,20,21,22))
    p_newdata <- predict(fit,newdata,type="response")
    
    # Ploting Old and new data
    plot(x=c(dfram$x,newdata$x),
         y=c(p_olddata,p_newdata),
         col=c(rep("blue",length(p_olddata)),rep("green",length(p_olddata))),
         xlab="x",
         ylab="y")
    lines(dfram)
    

    enter image description here

    Where the black line is the original signal, the blue circles are the prediction for the original points and the green circles are the prediction for the new data.

    The graph shows a perfect fit for the model because there's no noise in the data. In a real dataset you may find it so the fit will not look as nice as that.