Search code examples
rforecast

Creating a data frame in R from forecasts to use it for predict


I need your help in creating a data frame in R which contains separate columns of each forecasted variable. This "new data" will be used for predict line after regression.

The classical example online is as follows:

data=read.csv("some_path", header=TRUE)

fit<- with(data, lm(y1 ~x1))
pred<-predict.lm(fit, newdata=data.frame("x1"=some_number))

The model has three steps:

  1. Perform simple auto.arima forecasts for independent variables x1...xn.
fit_x1<- auto.arima(x1, stepwise = FALSE)

for_x1<-forecast(fit_x1, h=10)

...

fit_xn<- auto.arima(xn, stepwise = FALSE)

for_xn<-forecast(fit_xn, h=10)


After forecasting I need to create a data frame which contains each forecasted variable (x1...xn). Moreover, the "new data" should contain the columns with the exact names of X's used later in the regression. In these columns only the values of point forecasts should be saved, without Hi and Lo values.

  1. Run regression between y1 and x1...xn.
fit<- lm(formula=log(y1)~log(x1)+log(x2)+...+log(xn), data=data)

summary(fit)
  1. Perform a prediction using the results from "fit" and "new data" from step 1.
pred<- predict.lm(fit, newdata)

There is a very interesting post on this issue: Combining forecasts into a data frame in R and then exporting into excel However, in that particular case a new data frame is created from forecasts that produces only one column where forecasts from different x's are combined.

Can this three step procedure work in case of panel data time series?


Solution

  • I guess the following code should help you to create a newdata data.frame.

    Required libraries to re preduce.

    # library(dplyr)
    # library(stats)
    # library(forecast)
    

    With the following code I will create an example data with 5 time series:

    set.seed(123)
    
    dta <- ts(dplyr::tibble(
      AA = arima.sim(list(order=c(1,0,0), ar=.5), n=100, mean = 12), 
      AB = arima.sim(list(order=c(1,0,0), ar=.5), n=100, mean = 12), 
      AC = arima.sim(list(order=c(1,0,0), ar=.5), n=100, mean = 11), 
      BA = arima.sim(list(order=c(1,0,0), ar=.5), n=100, mean = 10), 
      BB = arima.sim(list(order=c(1,0,0), ar=.5), n=100, mean = 14)
    ), start = c(2013, 1), frequency = 12)
    
    head(dta)
    tail(dta)
    

    Now we will do the batch forecasting and create one new data data.frame/matrix.

    nseries <- ncol(dta)
    h <- 12 # forecast horizon
    
    newdata <- matrix(nrow = h, ncol = nseries) # empty newdata matrix
    
    for (i in seq_len(nseries)) {
      
      newdata[,i] <- forecast::forecast(forecast::auto.arima(dta[,i]), h = h)$mean
    }
    
    colnames(newdata) <- colnames(dta)
    
    head(newdata)
    

    I hope I understood the problem correctly.