I have a dataframe containing multiple (thousands) unequal-length monthly time series separated by a non-sequencial id variable. The data set looks like this,
id1 <- rep(12, 60)
ds1 <- seq(as.Date("2014-01-01"), as.Date("2018-12-31"), by = "month")
value1 <- sample(60)
id2 <- rep(132, 48)
ds2 <- seq(as.Date("2015-01-01"), as.Date("2018-12-31"), by = "month")
value2 <- sample(48)
id3 <- rep(210, 72)
ds3 <- seq(as.Date("2013-01-01"), as.Date("2018-12-31"), by = "month")
value3 <- sample(72)
id <- c(id1, id2, id3)
ds <- c(ds1, ds2, ds3)
y <- c(value1, value2, value3)
df <- data.frame(id, ds, y)
> head(df)
id ds y
1 12 2014-01-01 51
2 12 2014-02-01 22
3 12 2014-03-01 34
4 12 2014-04-01 53
5 12 2014-05-01 26
6 12 2014-06-01 56
I want to run the prophet forecasting model on every time series separated by id and generate a data frame with one month ahead forecast with one or two diagnostic statistics. The rows of that data frame should start with the id variable, ie. the first column should be id.
For a single id case, the procedure looks like this,
library(prophet)
set.seed(1234)
id <- rep(23, 60)
ds <- seq(as.Date("2014-01-01"), as.Date("2018-12-31"), by = "month")
y <- sample(60)
df <- data.frame(ds, y)
m <- prophet(df, seasonality.mode = 'multiplicative')
future <- make_future_dataframe(m, periods = 1)
fcst <- predict(m, future)
last_fcst <- fcst[61,]
mse <- mean((df$y - fcst$yhat[c(1:60)])^2)
mae <- mean(abs((df$y - fcst$yhat[c(1:60)])))
final <- cbind(last_fcst, mse, mae)
final
> final
ds trend multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper yearly
61 2018-12-02 27.19465 -0.1401155 -0.1401155 -0.1401155 -0.1401155
yearly_lower yearly_upper additive_terms additive_terms_lower additive_terms_upper yhat_lower yhat_upper
61 -0.1401155 -0.1401155 0 0 0 3.689257 42.66293
trend_lower trend_upper yhat mse mae
61 27.19465 27.19465 23.38425 242.4414 12.80532
I want to repeat this procedure and create a dataset with each one-month forecast with their corresponding row ids. Any idea what's the best way to do that?
As I said in the comments, it's best to split()
by id in a list()
. This way you can use lapply()
or (purrr::map()
) to make predictions and calculate the metrics per each id.
library(prophet)
library(dplyr) # for data wrangling
library(purrr) # for map/map2, equivalents are lapply/mapply from baseR
# preparations
l_df <- df %>%
split(.$id)
m_list <- map(l_df, prophet) # prophet call
future_list <- map(m_list, make_future_dataframe, periods = 1) # makes future obs
forecast_list <- map2(m_list, future_list, predict) # map2 because we have two inputs
So, forecast_list
will contain the output from the predict, again divided by id.
You can "merge" them back in a data.frame
by using bind_rows(forecast_list)
, as long as each df
is equal (same structure).
For the metrics I'd follow the same principle:
# to evaluate the model: create a new list
eval_list <- map2(forecast_list, l_df, function(x,z) {
# x is the single dataframe of predictions
# z is the original dataframe with actuals
x <- x[1:(nrow(x)-1), ] # subset to exclude first true forecast
x <- x %>% mutate(y_true = (z %>% select(y) %>% pull()) ) # add the column of actual values
})
# metrics evaluation:
eval_list <- map(eval_list, function(x) {
x <- x %>%
summarise(mse = mean((y_true - yhat)^2)) # add more scores
})
# $`12`
# mse
# 1 199.1829
#
# $`132`
# mse
# 1 156.6394
#
# $`210`
# mse
# 1 415.9659
You can use map2()
like I did for eval_list
to bind the true forecast with the metrics if you want.