I have been using the tbats
and nnetar
functions from the forecast
package to produce an hourly electric load forecast with a forecasting horizon of a week and a month, and both models perform satisfactorily. My data set comprises of hourly values from January 2017 up to early May 2022 (46848 values). However, when I try to make an hourly load forecast up to the end of the year (07/05/2022-31/12/2022, 5736 hourly values), the results are either flat or lose seasonality. Does anyone have any idea why the long-term forecast gives such poor results? Any idea on either model will be highly appreciated. I apologise for the very large data set.
I have uploaded the data set on git hub:
df <- read.csv(file = "https://raw.githubusercontent.com/Argiro1983/Load/LOAD/LOAD_2017_2022.csv", sep=";")
#fix datetime
df$TIME<- with(df, sprintf("%02d:00", TIME-1))
df$DATE<-as.Date(df$DATE, "%d/%m/%Y")
df$TIME <- paste(df$TIME, ':00', sep = '')
View(df)
library(ggpubr)
library(chron)
df$TIME <- chron(times=df$TIME)
DATETIME<-as.POSIXct(paste(df$DATE, df$TIME), origin = "1970-01-01 00:00:00", tz="UTC", usetz=TRUE)
my_df <- data.frame(timestamp = as.POSIXct(DATETIME, format = "%d.%m.%Y %H:%M", origin = "1970-01-01 00:00:00", tz = "UTC"), input = df[,3])
my_df <- setNames(my_df, c("DATETIME","LOAD"))
Particularly the TBATS model results lose seasonality and seem strange. The code I used is the following:
library(ggplot2)
library(forecast)
library(tseries)
library(dplyr)
Load = ts(my_df[, c('LOAD')])
my_df$Clean_Load = tsclean(Load)
Clean_Load = ts(my_df[, c('Clean_Load')])
load_ts = ts(Clean_Load)
msts <- msts(load_ts, seasonal.periods=c(24,168,8760), start=c(2017,01))
plot(msts, main="Load", xlab="Year", ylab="MWh")
s <- tbats(msts)
sp<- predict(s,h=5736)
The results are also flat when I run the nnetar
function, with or without temperature as an external regressor. I have tried different lambdas, but none seems to work:
#create dataframe for temperature historical values
Temperature_history <- read.csv(file = "https://raw.githubusercontent.com/Argiro1983/Load/LOAD/Temperature_history.csv", sep=";")
DATETIME<-as.POSIXct(Temperature_history$Datetime, format = "%d/%m/%Y %H:%M", tz="UCT", usetz=TRUE)
Temperature_df <- data.frame(timestamp = as.POSIXct(DATETIME, format = "%d/%m/%Y %H:%M", tz = "UCT"), input = Temperature_history$Temperature)
Temperature_df<- setNames(Temperature_df, c("DATETIME","TEMPERATURE"))
#create dataframe for temperature forecasted values
Temperature_forecast <- read.csv(file = "https://raw.githubusercontent.com/Argiro1983/Load/LOAD/Temperature_forecast.csv", sep=";")
DATETIME2<-as.POSIXct(Temperature_forecast$datehour, format = "%d/%m/%Y %H:%M", tz="UCT", usetz=TRUE)
Temp_forecast <- data.frame(timestamp = as.POSIXct(DATETIME2, format = "%d/%m/%Y %H:%M", tz = "UCT"), input = Temperature_forecast$TEMP_FORECAST)
View(Temp_forecast)
Temp_forecast <- setNames(Temp_forecast, c("DATETIME","TEMPERATURE"))
View(Temp_forecast)
#define and run NN model
library(forecast)
myts = ts(my_df$LOAD, frequency = 24)
fit2 = nnetar(myts,xreg = Temperature_df$TEMPERATURE, lambda = 0.5, P=1, MaxNWts=1177)
nnetforecast <- forecast(fit2, xreg = Temp_forecast$TEMPERATURE, h = 5736, PI = F, npaths=100, bootstrap = TRUE)
autoplot(nnetforecast, h = 5736)
First, your code won't work because the github link does not point to the csv file. Replace the first line as follows
df <- read.csv(file = "https://raw.githubusercontent.com/Argiro1983/Load/LOAD/LOAD_2017_2022.csv", sep=";")
Then running your code, I get reasonable results for the tbats model for the first few weeks:
sp <- forecast(s,h=14*24)
autoplot(sp, include=14*24)
Using a time series model to forecast much further ahead makes little sense here.
In any case, there are well-developed models for electricity demand that will do better than either TBATS or NNETAR. For a simple starting point, try Tao Hong's vanilla model, described in Section 2.2 of https://doi.org/10.1016/j.ijforecast.2015.09.006. It's just a linear regression, but it will do better than any of these models you are trying.