Search code examples
rloopsforecast

Evaluating forecast accuracy with ETS and loop function


I am foresting with combination of datasets from fpp2 package and ets function from the forecast package.Because I forecasting several time series I use own function which make several projections simultaneously.

# CODE
library(fpp2) # required for the data
library(dplyr)
library(forecast)

MY_DATA<-uschange[,1:4]
head(MY_DATA)
tail(MY_DATA)

#1. Own forecasting function            
FORECASTING_FUNCTION_ETS <- function(Z, hrz = 16) {
  timeseries <- msts(Z, start = 1970, seasonal.periods = 4)
  forecast <- ets(timeseries)
}

In order to get more accurate projection I want to use partitioning.Partitioning is done by trimming the series into two periods. The earlier period is the training set and the later period is the test set.

#2.Partitioning (training and test set)

        for (i in 1:20)
        { nTest <- 16*i  
        nTrain <- length(MY_DATA[,2:2])- nTest
        train <- window(MY_DATA[,2:2],start=1970, end=c(2015,3),nTrain)
        test <- window(MY_DATA[,2:2], start=1970, end=c(2016,3),nTrain+16)


        s <- FORECASTING_FUNCTION_ETS(train)
        sp<- predict(s,h=16)

        cat("----------------------------------

                              Data Partition",i,"

                Training Set includes",nTrain," time periods. Observations 1 to", nTrain, "
                Test Set includes 16 time periods. Observations", nTrain+1, "to", nTrain+16,"

                              ")
        print(accuracy(sp,test))
        cat("

                              ")
        print(sp$model)
        }

So far so good :) This code works fine with one series (Consumption) and I can see all results for Training and Test set.

But here my intention is to use above code for partitioning, not only for one but for all four series (Consumption,Income, Production and Savings) in same time. For that reasons I try with code below where I use "[,i]" in order to get results from all four series with code below:

#3.Trying to upgrade code above

for (i in 1:20)
{ nTest[,i] <- 16*i  
nTrain[,i] <- length(MY_DATA[,i])- nTest
train[,i] <- window(MY_DATA[,i],start=1970, end=c(2015,3),nTrain)
test[,i] <- window(MY_DATA[,i], start=1970, end=c(2016,3),nTrain+16)


s <- FORECASTING_FUNCTION_ETS(train[,i])
sp<- predict(s[,i],h=16)

cat("----------------------------------

                              Data Partition",i,"

                Training Set includes",nTrain," time periods. Observations 1 to", nTrain, "
                Test Set includes 16 time periods. Observations", nTrain+1, "to", nTrain+16,"

                              ")
print(accuracy(sp,test))
cat("

                              ")
print(sp$model)
}

But there are some mistake and this code don't work properly. So can anybody help me how to fugire out this problem and fix code above?


Solution

  • This isn't exactly what you asked for, so I don't expect you to accept this answer, but it's a fun problem for me, so I thought I'd offer an approach anyway.

    I'll start by assuming that your primary goal here is to figure out how to iterate a process for assessing the accuracy of a forecasting approach across multiple time series. You want to do that with an expanding window, where you gradually increase the proportion of your data included in the training set while repeatedly trying to forecast some fixed number of steps ahead, a process that mimics how this task often goes in real life.

    For simplicity's sake, I'm also going to assume that you don't really need to print all that output to the console and are really more interested in seeing the distribution and summary statistics for accuracy metrics associated with those iterations (like the table at the end of the example you're trying to follow).

    Starting from those assumptions, here's one way to go.

    # Split your data frame into a list of one-column data frames (here, time series) using as.list,
    # then use lapply to iterate your validation process over those series.
    Y <- lapply(as.list(MY_DATA), function(x) {
    
        # Instead of a for loop, let's use sapply to iterate over a vector of integers
        # representing the width of the training set in our expanding window, starting at
        # 70 percent of the full series and running to the series' end. Let's assume that,
        # in each iteration, we're going to forecast the following four quarters. 
        sapply(ceiling(length(x) * 0.7):(length(x) - 4), function(i) {
    
            # Because we're using indices instead of dates, we need to partition the 
            # series with subset instead of window. The training set runs from the start
            # of the series to our integer, and the test set grabs the next 4 quarters.
            train <- subset(x, end = i)
            test <- subset(x, start = i + 1, end = i + 4)
    
            # Now we fit an ETS model to that training set and use it to generate
            # forecasts for the following 4 quarters.
            mod <- ets(train)
            preds <- predict(mod, h = 4)
    
            # Finally, we check the accuracy of those forecasts against the test set...
            check <- accuracy(preds, test)
    
            # ...and return the accuracy metric of our choice (I've picked MAPE because
            # that's the one used in the example you're trying to follow, but that's easy
            # to change, or you could just return the accuracy object if you want options).
            return(check["Test set", "MAPE"])
    
        })
    
    })
    

    In this instance, that process returns a list of four vectors, each of which has a length of 53. Because those vectors are in a list, you can easily summarize them to get a sense of the overall accuracy for each series. I like to look at the distribution of accuracy measures, which you could easily do here with a density plot. Of course, the simplest thing is just to look at the central tendency:

    > sapply(Y, mean)
    Consumption      Income  Production     Savings 
       131.4818    172.7535    138.3171    106.9114
    

    If you wanted to compare the results for ETS to the results from some other forecasting process, you could just swap out the bit where the model is fitted, rerun, and compare the summaries. Or you could fold that comparison into the process, using lapply instead of sapply and returning a matrix or data frame with the results from the two processes side by side.

    Like I said, I know this strays a bit from your attempt to directly implement the approach in that blog post, but I think it's consistent with the spirit of your effort, and it was fun for me to work out.