Search code examples
rtimetime-seriesforecast

Get Prediction Intervals from hts2 package:VaughanR0/Streamline-R: Hierarchical and Grouped Time Series


i am currently working on my first TS-project in R and was using the hts2 package to include predicton intervals. However, i am pretty new to R and simply cannot figure out how to extract prediction intervals from different groups/leves of my gts. For example: How can I get the prediction intervals for the Top series?

Thanks for your help!

Here is the problem: This is what my raw dataframe looks like:

structure(list(mAK50 = c(1294, 1417, 1690, 1827, 1973, 2076, 
2196, 2127, 2078, 2033, 2260, 2309, 2265, 2105), mAK55 = c(1820, 
1921, 2295, 2556, 2820, 3044, 3144, 3366, 3443, 3452, 4144, 4622, 
4773, 4854), mAK60 = c(3401, 3895, 4634, 5078, 5527, 5570, 5549, 
5967, 5670, 6104, 7466, 8193, 8672, 8963), mAK65 = c(5858, 5746, 
6045, 6577, 7002, 7411, 8202, 8239, 7913, 8547, 9642, 10885, 
11536, 11896), mAK70 = c(8832, 9423, 10374, 10639, 10633, 9838, 
8931, 8442, 7597, 8091, 9750, 11050, 11999, 11815), mAK75 = c(8788, 
9304, 10459, 11598, 12378, 12667, 12871, 12720, 11261, 11349, 
12409, 12044, 11765, 11886), mAK80 = c(6896, 7597, 8494, 8809, 
8679, 9086, 9274, 9317, 9155, 10212, 12683, 13813, 14034, 13537
),  mAK85 = c(2595, 2827, 3256, 3702, 4087, 4186, 4446, 4316, 
3915, 4068, 4985, 5747, 6196, 6776), mAK90 = c(524, 656, 814, 
975, 951, 963, 995, 1046, 1064, 1134, 1302, 1460, 1499, 1557), 
wAK50 = c(1630, 1718, 2108, 2334, 2537, 2578, 2797, 2868, 
2847, 2942, 3304, 3459, 3432, 3379), wAK55 = c(2899, 3086, 
3613, 3837, 4250, 4508, 4569, 4809, 4837, 4856, 5886, 6636, 
6977, 6782), wAK60 = c(5800, 6661, 7377, 8250, 8377, 8565, 
8662, 8633, 8115, 8640, 10007, 11020, 11546, 11627), wAK65 = c(10012, 
9838, 10020, 10572, 11102, 11689, 12467, 12637, 11815, 12110, 
14199, 15512, 15846, 15901), wAK70 = c(17834, 18331, 18951, 
18934, 18610, 16847, 14967, 14113, 12781, 13552, 16253, 18579, 
19436, 19140), wAK75 = c(21610, 22108, 23521, 24417, 25463, 
24652, 24424, 22900, 20285, 20136, 21084, 20680, 19711, 19354
), wAK80 = c(18295, 19213, 20224, 20890, 20929, 20803, 20890, 
20401, 18762, 19966, 23731, 25391, 25097, 23846), wAK85 = c(8926, 
9372, 10005, 10664, 10693, 10510, 10713, 9878, 8918, 9213, 
11170, 12460, 13083, 13498), wAK90 = c(1906, 2274, 2676, 
3044, 3125, 3105, 3107, 2996, 2567, 2720, 3029, 3457, 3398, 
3492)), row.names = c(NA, -14L), class = c("tbl_df", "tbl", 
"data.frame"))

m/w stands for men/women, and AK50 for a certain age class(AK 50 = 50 years and younger and so on...)

i have converted this into a grouped time series using characters with:

library (hts)

gts1<-ts(data, start=2005,end=2018)

gts<-gts(gts1, characters = list(1, 4))

Now i would like to do some forecasting (for 32years) with an ARIMA model using this Code:

install.packages("remotes")
remotes::install_github("VaughanR0/Streamline-R")
fcsts = hts2::forecast.gts(gts, h = 32, method = "comb", weights ="wls",keep.intervals=TRUE, fmethod = "arima")                            

This is my output:

summary(fcsts)
Grouped Time Series 
4 Levels 
Number of groups at each level: 1 2 9 18 
Total number of series: 30 
Number of observations in each historical series: 14 
Number of forecasts per series: 32 
Top level series of forecasts: 
Time Series:
Start = 2019 
End = 2050 
Frequency = 1 
[1] 191409.5 191828.0 192924.3 195538.0 199757.3 204983.4 210277.1....

Method: Bottom-up forecasts 
Forecast method: Arima 

Now i would like to get the prediction interval for the Top series. I tried "fcsts$upper" but i only get intervals for the bottom series...

Can anyone help me out?


Solution

  • The hts package does not compute prediction intervals. We have written a new package, fable, that does what you want and is much easier to use for hierarchical and grouped time series. Here is an example using your data.

    library(tidyverse)
    library(tsibble)
    #> 
    #> Attaching package: 'tsibble'
    #> The following object is masked from 'package:dplyr':
    #> 
    #>     id
    library(fable)
    #> Loading required package: fabletools
    
    # Create tsibble
    data <- tibble(
        year = 2005:2018,
        mAK50 = c(1294, 1417, 1690, 1827, 1973, 2076, 2196, 2127, 2078, 2033, 2260, 2309, 2265, 2105),
        mAK55 = c(1820, 1921, 2295, 2556, 2820, 3044, 3144, 3366, 3443, 3452, 4144, 4622, 4773, 4854),
        mAK60 = c(3401, 3895, 4634, 5078, 5527, 5570, 5549, 5967, 5670, 6104, 7466, 8193, 8672, 8963),
        mAK65 = c(5858, 5746, 6045, 6577, 7002, 7411, 8202, 8239, 7913, 8547, 9642, 10885, 11536, 11896),
        mAK70 = c(8832, 9423, 10374, 10639, 10633, 9838, 8931, 8442, 7597, 8091, 9750, 11050, 11999, 11815),
        mAK75 = c(8788, 9304, 10459, 11598, 12378, 12667, 12871, 12720, 11261, 11349, 12409, 12044, 11765, 11886),
        mAK80 = c(6896, 7597, 8494, 8809, 8679, 9086, 9274, 9317, 9155, 10212, 12683, 13813, 14034, 13537),
        mAK85 = c(2595, 2827, 3256, 3702, 4087, 4186, 4446, 4316, 3915, 4068, 4985, 5747, 6196, 6776),
        mAK90 = c(524, 656, 814, 975, 951, 963, 995, 1046, 1064, 1134, 1302, 1460, 1499, 1557),
        wAK50 = c(1630, 1718, 2108, 2334, 2537, 2578, 2797, 2868, 2847, 2942, 3304, 3459, 3432, 3379),
        wAK55 = c(2899, 3086, 3613, 3837, 4250, 4508, 4569, 4809, 4837, 4856, 5886, 6636, 6977, 6782),
        wAK60 = c(5800, 6661, 7377, 8250, 8377, 8565, 8662, 8633, 8115, 8640, 10007, 11020, 11546, 11627),
        wAK65 = c(10012, 9838, 10020, 10572, 11102, 11689, 12467, 12637, 11815, 12110, 14199, 15512, 15846, 15901),
        wAK70 = c(17834, 18331, 18951, 18934, 18610, 16847, 14967, 14113, 12781, 13552, 16253, 18579, 19436, 19140),
        wAK75 = c(21610, 22108, 23521, 24417, 25463, 24652, 24424, 22900, 20285, 20136, 21084, 20680, 19711, 19354),
        wAK80 = c(18295, 19213, 20224, 20890, 20929, 20803, 20890, 20401, 18762, 19966, 23731, 25391, 25097, 23846),
        wAK85 = c(8926, 9372, 10005, 10664, 10693, 10510, 10713, 9878, 8918, 9213, 11170, 12460, 13083, 13498),
        wAK90 = c(1906, 2274, 2676, 3044, 3125, 3105, 3107, 2996, 2567, 2720, 3029, 3457, 3398, 3492),
      ) %>%
      pivot_longer(-year, names_to = "group", values_to = "value") %>%
      mutate(
        sex = stringr::str_sub(group, 1, 1),
        age = stringr::str_sub(group, 4, 5),
      ) %>%
      select(-group) %>%
      as_tsibble(index=year, key=c(sex,age))
    
    data
    #> # A tsibble: 252 x 4 [1Y]
    #> # Key:       sex, age [18]
    #>     year value sex   age  
    #>    <int> <dbl> <chr> <chr>
    #>  1  2005  1294 m     50   
    #>  2  2006  1417 m     50   
    #>  3  2007  1690 m     50   
    #>  4  2008  1827 m     50   
    #>  5  2009  1973 m     50   
    #>  6  2010  2076 m     50   
    #>  7  2011  2196 m     50   
    #>  8  2012  2127 m     50   
    #>  9  2013  2078 m     50   
    #> 10  2014  2033 m     50   
    #> # … with 242 more rows
    
    
    fcsts <- data %>% 
      # Specify group structure and create aggregates
      aggregate_key(sex * age, value = sum(value)) %>%
      # Fit models 
      model(arima = ARIMA(value)) %>%
      # Set up reconciliation
      mutate(mint = min_trace(arima)) %>%
      # Produce the forecasts
      forecast(h = 32)
    #> New names:
    #> * `` -> ...1
    #> * `` -> ...2
    #> * `` -> ...3
    #> * `` -> ...4
    #> * `` -> ...5
    #> * ...
    
    # Create 95% intervals for top level
    fcsts %>% 
      hilo(level=95) %>%
      filter(is_aggregated(sex) & is_aggregated(age))
    #> # A tsibble: 64 x 6 [1Y]
    #> # Key:       sex, age, .model [2]
    #>    sex          age          .model  year   value                  `95%`
    #>    <chr>        <chr>        <chr>  <dbl>   <dbl>                 <hilo>
    #>  1 <aggregated> <aggregated> arima   2019 187774. [173317.2, 202230.3]95
    #>  2 <aggregated> <aggregated> arima   2020 186021. [155948.3, 216094.4]95
    #>  3 <aggregated> <aggregated> arima   2021 185864. [143996.2, 227731.0]95
    #>  4 <aggregated> <aggregated> arima   2022 186589. [137523.8, 235655.0]95
    #>  5 <aggregated> <aggregated> arima   2023 187265. [133768.8, 240760.3]95
    #>  6 <aggregated> <aggregated> arima   2024 187467. [130517.5, 244415.5]95
    #>  7 <aggregated> <aggregated> arima   2025 187303. [126897.4, 247709.1]95
    #>  8 <aggregated> <aggregated> arima   2026 187070. [122946.0, 251194.2]95
    #>  9 <aggregated> <aggregated> arima   2027 186958. [119048.8, 254866.4]95
    #> 10 <aggregated> <aggregated> arima   2028 186979. [115480.3, 258477.4]95
    #> # … with 54 more rows
    

    Created on 2020-04-29 by the reprex package (v0.3.0)