Search code examples
rlinear-regressionforecastingstress-testing

How to forecast y values using linear regression model on new x values


I have a multivariate linear regression model:

model <- lm(y ~ a + b + c, data = df)

Lets say the historical period for y, a, b, and c is quarterly data from 2000-2017.

Date    y    a    b    c
2000Q1  2    1.5  1.3  8.1
2000Q2  2.3  1.8  1.2  7.6
.       .    .    .    .
.       .    .    .    .
.       .    .    .    .
.       .    .    .    .
2017Q4  8.7  3.5  5.6  3.2

Now that I have my linear model, I want to forecast y by using new data for a, b, and c that has a period from 2017-2020, lets call them a2, b2, and c2.

Date    a2   b2   c2
2017Q4  3.5  5.6  3.2
2018Q1  4.1  6.3  3.0
.       .    .    .
.       .    .    .
.       .    .    .
2020Q4  5.6  7.8  2.2

How do I use the linear model from my previous set of historical/actual data (a, b, and c), and forecast y against the newer values of x (a2, b2, and c2)?

I have tried using the predict() and predict.lm() functions, however nothing is giving me the results I am looking for. I can manually type in the linear model and create these forecasts, but I'm sure there is a more efficient way to do this.

Update

Here is a small example of what I am doing:

df <- data.frame(y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                 a = c(2, 2.3, 2.6, 2.9, 2.4, 2.6, 3.0, 3.2, 3.9, 3.7),
                 b = c(9, 8.7, 9.1, 7.8, 8.2, 8, 6.9, 7.8, 9.1, 5.7))

attach(df)

model <- lm(y ~ a + b)

df2 <- data.frame(a2 = c(3.7, 4.0, 5.2, 5.6, 5.8, 6),
              b2 = c(5.7, 5.5, 5.3, 5.1, 4.9, 4.7))

predict(model, newdata = df2)

And I keep getting the regular model results with a warning message:

1         2         3         4         5         6         7         8         
9        10 
 1.409122  2.807886  3.690647  5.826560  3.569001  4.501510  6.882534  
7.004180  8.793667 10.514892 
Warning message:
'newdata' had 6 rows but variables found have 10 rows 

Solution

  • Updated to match added example

    The names in the newdata must match the names in the old data / linear model.

    Using your updated example, make the names in df2 match the names in df before running predict.

    names(df2) = c("a","b")
    predict(model, newdata = df2)