I have the following data table that I want to use to predict DE prices based on the other variables in the data table with the GLM (= Generalized Linear Model).
set.seed(123)
dt.data <- data.table(date = seq(as.Date('2019-01-01'), by = '1 day', length.out = 731),
'DE' = rnorm(731, 30, 1), 'windDE' = rnorm(731, 10, 1),
'consumptionDE' = rnorm(731, 50, 1), 'nuclearDE' = rnorm(731, 8, 1),
'solarDE' = rnorm(731, 1, 1), check.names = FALSE)
dt.forecastData <- dt.data
dt.forecastData <- na.omit(dt.forecastData)
fromTestDate <- "2019-12-31"
fromDateTest <- base::toString(fromTestDate)
## Create train and test date-vectors depending on fromDateTest: ##
v.train <- which(dt.forecastData$date <= fromDateTest)
v.test <- which(dt.forecastData$date == as.Date(fromDateTest)+1)
## Create data tables for train and test data with specific date range (fromTestDate): ##
dt.train <- dt.forecastData[v.train]
v.trainDate <- dt.train$date
dt.test <- dt.forecastData[v.test]
v.testDate <- dt.test$date
## Delete column "date" of train and test data for model fitting: ##
dt.train <- dt.train[, c("date") := NULL]
dt.test <- dt.test[, c("date") := NULL]
## MODEL FITTING: ##
## Generalized Linear Model: ##
xgbModel <- stats::glm(DE ~ .-1, data = dt.train,
family = quasi(link = "identity", variance = "constant"))
## Train and Test Data PREDICTION with xgbModel: ##
dt.train$prediction <- stats::predict.glm(xgbModel, dt.train)
dt.test$prediction <- stats::predict.glm(xgbModel, dt.test)
## Add date columns to dt.train and dt.test: ##
dt.train <- data.table(date = v.trainDate, dt.train)
dt.test <- data.table(date = v.testDate, dt.test)
Here in this code I train the model with the data from 2019-01-01
to 2019-12-31
and test it with the day-ahead forecast from 2020-01-01
.
Now I want to create a for
-loop so that I run my model 365 in total, as follows:
Run 1:
a) use 01-01-2019
to 31-12-2019
to train my model
b) predict for 01-01-2020
(test data)
c) use the actual data point for 01-01-2020
to evaluate the prediction
Run 2:
a) use 01-01-2019
to 01-01-2020
to train my model
b) predict for 02-01-2020
c) use the actual data point for 02-01-2020
to evaluate the prediction
etc.
In the end, I want to plot e.g. the cumulate sum of the individual prediction performances Or the histogram of the individual prediction performances and some summary statistics (mean, median, sd, etc.)
Unfortunately, I don't know how to start with the loop and where I can save my predictions of each run? I hope someone can help me with this!
Basically, you have to construct a vector that contains the end dates for each run. Then, you can pick one of the end dates in each iteration of the loop, run the model and predict one day ahead. Using your code, this may look something like this:
set.seed(123)
dt.data <- data.table(date = seq(as.Date('2019-01-01'), by = '1 day', length.out = 731),
'DE' = rnorm(731, 30, 1), 'windDE' = rnorm(731, 10, 1),
'consumptionDE' = rnorm(731, 50, 1), 'nuclearDE' = rnorm(731, 8, 1),
'solarDE' = rnorm(731, 1, 1), check.names = FALSE)
dt.forecastData <- dt.data
dt.forecastData <- na.omit(dt.forecastData)
Here, I construct a vector holding all days between Dec 31 2019 and Jan 15 2020, adapt as needed:
# vector of all end dates
eval.dates <- seq.Date(from = as.Date("2019-12-31"),
to = as.Date("2020-01-15"),
by = 1)
Here, I create a storage file for the one-day ahead predictions
# storage file for all predictions
test.predictions <- numeric(length = length(eval.dates))
Now, run the loop using your code and pick one of the end dates in each iteration:
for(ii in 1:length(eval.dates)){ # loop start
fromTestDate <- eval.dates[ii] # get end date for iteration
fromDateTest <- base::toString(fromTestDate)
## Create train and test date-vectors depending on fromDateTest: ##
v.train <- which(dt.forecastData$date <= fromDateTest)
v.test <- which(dt.forecastData$date == as.Date(fromDateTest)+1)
## Create data tables for train and test data with specific date range (fromTestDate): ##
dt.train <- dt.forecastData[v.train]
v.trainDate <- dt.train$date
dt.test <- dt.forecastData[v.test]
v.testDate <- dt.test$date
## Delete column "date" of train and test data for model fitting: ##
dt.train <- dt.train[, c("date") := NULL]
dt.test <- dt.test[, c("date") := NULL]
## MODEL FITTING: ##
## Generalized Linear Model: ##
xgbModel <- stats::glm(DE ~ .-1, data = dt.train,
family = quasi(link = "identity", variance = "constant"))
## Train and Test Data PREDICTION with xgbModel: ##
test.predictions[ii] <- stats::predict.glm(xgbModel, dt.test)
# verbose
print(ii)
} # loop end
As you can see, this is a bit of a shortened version of your code and I omitted the predictions for the training set for brevity. They can easily be added along the lines of the code you have above.
You did not specify which measures you want to use to evaluate your out-of-sample predictions. The object test.predictions
holds all your one-step-ahead predictions and you can use this to compute RMSEs, LPS or whatever quantification of predictive power that you'd like to use.