Search code examples
rstatisticstime-seriesforecast

Forecast time series with multiple predictors return error


I have 3 quarterly time-series data: beer, temp, income, and all those data start from 2010 Q1 and end at 2018 Q3. here is my data:

      Qtr1  Qtr2  Qtr3  Qtr4
2010 3.301 2.826 2.712 3.934
2011 3.192 2.975 2.865 3.789
2012 2.728 2.840 2.633 3.837
2013 3.090 2.779 2.594 3.960
2014 2.771 2.860 2.676 3.831
2015 2.986 2.558 2.810 3.743
2016 3.054 2.764 2.985 3.807
2017 3.046 2.880 2.689 4.005
2018 3.013 2.800 2.937      
> temp
          Qtr1      Qtr2      Qtr3      Qtr4
2010 16.766667 11.433333  9.400000 14.533333
2011 17.033333 11.966667  8.633333 13.900000
2012 15.800000 10.600000  9.700000 13.766667
2013 17.033333 11.333333 10.200000 14.866667
2014 16.266667 11.900000  9.266667 13.900000
2015 17.300000 11.400000  8.733333 13.966667
2016 18.033333 12.400000  9.300000 14.100000
2017 16.533333 11.100000  9.733333 15.300000
2018 18.400000 11.033333  9.700000
> income
       Qtr1   Qtr2   Qtr3   Qtr4
2010 48.064 47.755 47.878 47.707
2011 48.226 49.063 49.322 49.518
2012 49.714 49.390 49.683 50.386
2013 50.405 51.476 52.527 53.456
2014 54.309 54.308 54.811 54.723
2015 55.254 55.913 56.472 56.316
2016 58.013 58.312 58.744 59.806
2017 59.881 60.683 61.164 61.887
2018 61.969 62.507 63.054

I tried to forecast 2 years values of beer using trend and seasonal dummy predictor, but R always give me dimension error.

> forecast(tslm(beer~temp+income+trend+season), h = 8)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  variable lengths differ (found for 'trend')
In addition: Warning message:
'newdata' had 8 rows but variables found have 35 rows 

Using data.frame, but it always has warning messages

> df = data.frame(beer,temp,income)
> forecast(tslm(beer~temp+income+trend+season, data = df), h = 8, newdata = df)
        Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
2018 Q4       2.991699 2.3132374 3.670161 1.9328516 4.050546
2019 Q1       1.752979 1.0424701 2.463488 0.6441168 2.861841
2019 Q2       1.667426 0.9738984 2.360954 0.5850656 2.749787
2019 Q3       1.875770 1.0662253 2.685315 0.6123465 3.139194
2019 Q4       2.884266 2.1308413 3.637691 1.7084267 4.060105
2020 Q1       1.729527 1.0011085 2.457945 0.5927141 2.866339
2020 Q2       1.599902 0.8838936 2.315910 0.4824569 2.717347
2020 Q3       1.837085 1.0376823 2.636488 0.5894896 3.084681
2020 Q4       2.800613 2.0470872 3.554139 1.6246159 3.976610
2021 Q1       1.566346 0.7583452 2.374347 0.3053320 2.827360
2021 Q2       1.537637 0.7593199 2.315954 0.3229493 2.752324
2021 Q3       1.758491 0.9202322 2.596749 0.4502548 3.066726
2021 Q4       2.766178 1.9445748 3.587782 1.4839351 4.048421
2022 Q1       1.600676 0.8060401 2.395313 0.3605199 2.840833
2022 Q2       1.610888 0.8665356 2.355241 0.4492074 2.772569
2022 Q3       1.870518 1.0513857 2.689650 0.5921317 3.148904
2022 Q4       2.855698 2.1234601 3.587935 1.7129243 3.998471
2023 Q1       1.675867 0.9187581 2.432976 0.4942778 2.857457
2023 Q2       1.590225 0.8580061 2.322445 0.4474806 2.732970
2023 Q3       1.783578 0.9603794 2.606776 0.4988456 3.068310
2023 Q4       2.829362 2.0411286 3.617595 1.5991983 4.059525
2024 Q1       1.629442 0.8509889 2.407896 0.4145418 2.844343
2024 Q2       1.546023 0.7994307 2.292615 0.3808469 2.711199
2024 Q3       1.759382 0.9209619 2.597803 0.4508937 3.067871
2024 Q4       2.906656 2.1369607 3.676351 1.7054240 4.107887
2025 Q1       1.694576 0.9426298 2.446521 0.5210444 2.868107
2025 Q2       1.585464 0.8512783 2.319649 0.4396504 2.731277
2025 Q3       1.858994 1.0774412 2.640548 0.6392561 3.078733
2025 Q4       2.836440 2.0876545 3.585226 1.6678407 4.005040
2026 Q1       1.664587 0.9073179 2.421857 0.4827478 2.846427
2026 Q2       1.628942 0.9118032 2.346081 0.5097325 2.748152
2026 Q3       1.911943 1.1070396 2.716846 0.6557631 3.168123
2026 Q4       2.916889 2.1414033 3.692375 1.7066199 4.127158
2027 Q1       1.649728 0.8839868 2.415469 0.4546670 2.844789
2027 Q2       1.619649 0.8980352 2.341262 0.4934558 2.745842
Warning messages:
1: In forecast.lm(tslm(beer ~ temp + income + trend + season, data = df),  :
  Could not find required variable temp in newdata. Specify newdata as a named data.frame
2: In forecast.lm(tslm(beer ~ temp + income + trend + season, data = df),  :
  Could not find required variable income in newdata. Specify newdata as a named data.frame

I tried to rename the column in dataframe, this time works well but the plot doesn't look right

> names(df)[2] = "temp"
> names(df)[3] = "income"
> autoplot(forecast(tslm(beer~temp+income+trend+season, data = df), h = 8, newdata = df))
[enter image description here][1]

But when I exclude the predictor temp and income, it works well

> forecast(tslm(beer~trend+season), h = 8)
        Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2018 Q4       3.854655 3.654084 4.055226 3.542067 4.167244
2019 Q1       3.010562 2.809372 3.211751 2.697010 3.324113
2019 Q2       2.799562 2.598372 3.000751 2.486010 3.113113
2019 Q3       2.757228 2.556039 2.958417 2.443676 3.070780
2019 Q4       3.852745 3.648494 4.056997 3.534421 4.171070
2020 Q1       3.008652 2.803430 3.213874 2.688815 3.328489
2020 Q2       2.797652 2.592430 3.002874 2.477815 3.117489
2020 Q3       2.755318 2.550096 2.960540 2.435481 3.075155

I want forecast 2 years beer value with temp, income, trend, seasonal dummy as predictor, I tried everything I know.. Please help. Thanks in advance.


Solution

  • There are a couple of problems here. The first is that you are providing historical temp and income data in the newdata argument, when they should be future values for these variables. The second issue is that the forecast package is not particularly good at finding the relevant variables in newdata and is getting confused here. Workarounds are possible, but I suggest you use the newer fable package instead of forecast which makes this sort of thing much easier.

    library(tidyverse)
    library(lubridate)
    library(tsibble)
    library(fable)
    
    df <- tsibble(
      quarter = seq(yearquarter("2010 Q1"), to=yearquarter("2018 Q3"), by = 1),
      beer = c(
        3.301, 2.826, 2.712, 3.934, 3.192, 2.975, 2.865, 3.789,
        2.728, 2.840, 2.633, 3.837, 3.090, 2.779, 2.594, 3.960,
        2.771, 2.860, 2.676, 3.831, 2.986, 2.558, 2.810, 3.743,
        3.054, 2.764, 2.985, 3.807, 3.046, 2.880, 2.689, 4.005,
        3.013, 2.800, 2.937
      ),
      temp = c(
        16.766667, 11.433333, 9.400000, 14.533333, 17.033333, 11.966667, 8.633333, 13.900000,
        15.800000, 10.600000, 9.700000, 13.766667, 17.033333, 11.333333, 10.200000, 14.866667,
        16.266667, 11.900000, 9.266667, 13.900000, 17.300000, 11.400000, 8.733333, 13.966667,
        18.033333, 12.400000, 9.300000, 14.100000, 16.533333, 11.100000, 9.733333, 15.300000,
        18.400000, 11.033333, 9.700000
      ),
      income = c(
        48.064, 47.755, 47.878, 47.707, 48.226, 49.063, 49.322, 49.518,
        49.714, 49.390, 49.683, 50.386, 50.405, 51.476, 52.527, 53.456,
        54.309, 54.308, 54.811, 54.723, 55.254, 55.913, 56.472, 56.316,
        58.013, 58.312, 58.744, 59.806, 59.881, 60.683, 61.164, 61.887,
        61.969, 62.507, 63.054
      ),
      index = quarter
    )
    df
    #> # A tsibble: 35 x 4 [1Q]
    #>    quarter  beer  temp income
    #>      <qtr> <dbl> <dbl>  <dbl>
    #>  1 2010 Q1  3.30 16.8    48.1
    #>  2 2010 Q2  2.83 11.4    47.8
    #>  3 2010 Q3  2.71  9.4    47.9
    #>  4 2010 Q4  3.93 14.5    47.7
    #>  5 2011 Q1  3.19 17.0    48.2
    #>  6 2011 Q2  2.98 12.0    49.1
    #>  7 2011 Q3  2.86  8.63   49.3
    #>  8 2011 Q4  3.79 13.9    49.5
    #>  9 2012 Q1  2.73 15.8    49.7
    #> 10 2012 Q2  2.84 10.6    49.4
    #> # … with 25 more rows
    
    train <- df %>% filter(year(quarter) <= 2016)
    test <- df %>% filter(year(quarter) > 2016)
    fc <- train %>%
      model(TSLM(beer ~ temp + income + trend() + season())) %>%
      forecast(new_data = test)
    

    Created on 2020-04-29 by the reprex package (v0.3.0)