Prophet forecasting by id and populating a data frame with one month ahead forecasts

I have a dataframe containing multiple (thousands) unequal-length monthly time series separated by a non-sequencial id variable. The data set looks like this,

id1 <- rep(12, 60)
ds1 <- seq(as.Date("2014-01-01"), as.Date("2018-12-31"), by = "month")
value1 <- sample(60)

id2 <- rep(132, 48)
ds2 <- seq(as.Date("2015-01-01"), as.Date("2018-12-31"), by = "month")
value2 <- sample(48)

id3 <- rep(210, 72)
ds3 <- seq(as.Date("2013-01-01"), as.Date("2018-12-31"), by = "month")
value3 <- sample(72)

id <- c(id1, id2, id3)
ds <- c(ds1, ds2, ds3)
y <- c(value1, value2, value3)

df <- data.frame(id, ds, y)
> head(df)
  id         ds  y
1 12 2014-01-01 51
2 12 2014-02-01 22
3 12 2014-03-01 34
4 12 2014-04-01 53
5 12 2014-05-01 26
6 12 2014-06-01 56

I want to run the prophet forecasting model on every time series separated by id and generate a data frame with one month ahead forecast with one or two diagnostic statistics. The rows of that data frame should start with the id variable, ie. the first column should be id.

For a single id case, the procedure looks like this,

library(prophet)
set.seed(1234)

id <- rep(23, 60)
ds <- seq(as.Date("2014-01-01"), as.Date("2018-12-31"), by = "month")
y <- sample(60)
df <- data.frame(ds, y)

m <- prophet(df, seasonality.mode = 'multiplicative')
future <- make_future_dataframe(m, periods = 1)
fcst <- predict(m, future)
last_fcst <- fcst[61,]
mse <- mean((df$y - fcst$yhat[c(1:60)])^2)
mae <- mean(abs((df$y - fcst$yhat[c(1:60)])))
final <- cbind(last_fcst, mse, mae)
final
> final
           ds    trend multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper     yearly
61 2018-12-02 27.19465           -0.1401155                 -0.1401155                 -0.1401155 -0.1401155
   yearly_lower yearly_upper additive_terms additive_terms_lower additive_terms_upper yhat_lower yhat_upper
61   -0.1401155   -0.1401155              0                    0                    0   3.689257   42.66293
   trend_lower trend_upper     yhat      mse      mae
61    27.19465    27.19465 23.38425 242.4414 12.80532

I want to repeat this procedure and create a dataset with each one-month forecast with their corresponding row ids. Any idea what's the best way to do that?

Solution

As I said in the comments, it's best to split() by id in a list(). This way you can use lapply() or (purrr::map()) to make predictions and calculate the metrics per each id.

library(prophet)
library(dplyr) # for data wrangling
library(purrr) # for map/map2, equivalents are lapply/mapply from baseR

# preparations
l_df <- df %>% 
  split(.$id)

m_list <- map(l_df, prophet) # prophet call
future_list <- map(m_list, make_future_dataframe, periods = 1) # makes future obs
forecast_list <- map2(m_list, future_list, predict) # map2 because we have two inputs

So, forecast_list will contain the output from the predict, again divided by id.

You can "merge" them back in a data.frame by using bind_rows(forecast_list), as long as each df is equal (same structure).

For the metrics I'd follow the same principle:

# to evaluate the model: create a new list
eval_list <- map2(forecast_list, l_df, function(x,z) {
 # x is the single dataframe of predictions
 # z is the original dataframe with actuals

  x <- x[1:(nrow(x)-1), ] # subset to exclude first true forecast
  x <- x %>% mutate(y_true = (z %>% select(y) %>% pull()) ) # add the column of actual values

})

# metrics evaluation:
eval_list <- map(eval_list, function(x) {
  x <- x %>% 
    summarise(mse = mean((y_true - yhat)^2)) # add more scores
})
# $`12`
# mse
# 1 199.1829
# 
# $`132`
# mse
# 1 156.6394
# 
# $`210`
# mse
# 1 415.9659

You can use map2() like I did for eval_list to bind the true forecast with the metrics if you want.