Search code examples
rtime-seriesforecastingfable-r

get rid of negative values in the prediction interval of fable's forecast using an ETS model


I have used fable's forecast with an ETS model like so:

stl.fc <- train |>
  model(stlf = decomposition_model(
    STL(value), # decomposition to use (STL)
    ETS(season_adjust),
    SNAIVE(season_year) # it's the default but we make it explicit
  )) |>
  forecast(h = TIME_HORIZON)

My data consists of number of asylum applications and must be positive or zero. When I try to plot the result of the forecast with autoplot, the prediction interval is also shown (which is good), but its values can be also negative (which is not good).

stl.fc |>
  autoplot(data.tsbl)

enter image description here

I am trying to get rid of those negative values, but it seems tricky since the object containing such values is of class "distribution" "vctrs_vctr" "list" and I have no idea how to do this.

Here is a screenshot of how it looks like (the column I am referring to is called value).

enter image description here

Each element in the value column looks like a list of two entries, mu and sigma, which I can see with (e.g.) unlist(stl.fc$value[1])

      mu    sigma 
8087.440 3308.573 

From here I am a bit stuck as to how to remove those negative numbers.

I am inspecting this "Ensuring forecasts stay within limits" page of the "Forecasting: Principles and Practive" book but I seem to fail to understand how to achieve what I want.

Alternatively I think I would also live if I can just get rid of the negative Y-values in the plot (i.e. hiding those negative values), but this feels more like cheating...


Solution

  • UPDATED WITH A MUCH SIMPLER APPROACH

    Thanks to Mitchell O'Hara-Wild's comment, here is a simple solution:

    stl.fc <- train |>
      model(stlf = decomposition_model(
        STL(value), # decomposition to use (STL)
        ETS(season_adjust),
        SNAIVE(season_year) # it's the default but we make it explicit
      )) |>
      forecast(h = TIME_HORIZON) |>
      mutate(value = distributional::dist_truncated(value, 0)) # SOLUTION ;)
    

    ORIGINAL ANSWER

    I finally came up with a solution that instead of letting the tool calculate the prediction interval automatically, generates 1000 simulated paths (inspired by this), and before aggregating these by date, change the negative values to zero.

    Below the script and the new plot:

    # create the model
    stl.mod <- train |>
      model(stlf = decomposition_model(
        STL(value), # decomposition to use (STL)
        ETS(season_adjust),
        SNAIVE(season_year) # it's the default but we make it explicit
      ))
    
    # generate simulated paths and change negative values to zero before aggregating
    # https://otexts.com/fpp3/combinations.html
    stl.fc <- stl.mod |>
      # Generate 1000 future sample paths
      generate(h = TIME_HORIZON, times = 1000) |>
      mutate(.sim = if_else(.sim < 0, 0, .sim)) |>
      # Compute forecast distributions from future sample paths
      as_tibble() |>
      group_by(date, .model) |>
      summarise(
        dist = distributional::dist_sample(list(.sim))
      ) |>
      ungroup() |>
      # Create fable object
      as_fable(index = date, key = .model,
               distribution = dist, response = "value")
    
    stl.fc |>
      autoplot(data.tsbl)
    

    enter image description here