Search code examples
rforecast

R auto.arima forecast


I want create forecast for something, And I choose auto.arima. After trained, I can't calculate forecast 2 more articles:

my_forecast <- ts(frc$sales_30, frequency = 12)

my_forecast  <- tsclean(my_forecast)

fit <- auto.arima(my_forecast)

But I have 100 articles +nd i need forecast for all this names (format: Year, Month, Sales, Article)


Solution

  • The typical workflow in R for this task is listwise. Meaning you spread your data by articels in list-items and apply funcions on these. As you might have understood already the year and month are irrelevant as the time-series is generated by the frequency variable of the ts() function.

    Therefore this sample will work with articles A and B only aswell as theire imaginary monthly sales vector, which we assume has been sorted by date already.

    I will not dive into technicallities of time-series analysis/predictions and do mainly focus on the process/code to make multiple predictions based on a df that contains all articles (or any on level grouping) and the according sales history. I did not use the tsclean() function but it should be evident from the workflow how to include it:

    library(forecast)
    library(tidyverse)
    # set up some dummy data (has no clear pattern in terms of seasonality etc. but works for demo)
    ## bear in  mind that this is randomly generated data therefore you most likely will not reproduce my data but with the help of a seed you can work arround this as well.
    df <- data.frame(article = c(rep("A", 24), rep("B", 24)), 
                     sales = c(sample(seq(from = 20, to = 50, by = 5), size = 24, replace = TRUE),
                               sample(seq(from = 20, to = 50, by = 5), size = 24, replace = TRUE)))
    # build grouping inside de df/tibble
    dfg <- df %>% 
        dplyr::group_by(article) 
    # split the new df by grouping criteria into list
    dfl <- dfg %>%
        dplyr::group_split(.keep = FALSE)
    # set list names acording to article value (no needed but might be helpfull for you)
    names(dfl) <- dplyr::group_keys(dfg)$article
    # apply ts function with frequency 12 to the list items
    dflt <- lapply(dfl, ts, frequency = 12)
    # apply the auto.arima to build list of models
    dfltm <- lapply(dflt, forecast::auto.arima)
    # apply forecast with horizon 2 on the list of final models from auto.arima
    predictions <- lapply(dfltm, forecast::forecast, h = 2)
    # print results
    predictions 
    
    $A
          Point Forecast    Lo 80    Hi 80    Lo 95   Hi 95
    Jan 3       34.79167 22.47636 47.10697 15.95703 53.6263
    Feb 3       34.79167 22.47636 47.10697 15.95703 53.6263
    
    $B
          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
    Jan 3       34.58333 20.32802 48.83865 12.78171 56.38496
    Feb 3       34.58333 20.32802 48.83865 12.78171 56.38496
    

    A modern way of doing the same thing is working with nested lists inside of a tibble:

           # build list inside the tibble/df by existing groupings
    npd <- tidyr::nest(dfg) %>%
                               # generate new column of ts series data
        dplyr::mutate(tsdata = purrr::map(data, ~ ts(.x, frequency = 12)),
                               # use auto.arima on the data to build new column of final auto.arima models
                      models = purrr::map(tsdata, ~ forecast::auto.arima(.x)),
                                    # generate forecast as new column
                      predictions = purrr::map(models, ~ forecast::forecast(.x, h = 2))) 
    # print prediction results
    npd$predictions
    [[1]]
          Point Forecast    Lo 80    Hi 80    Lo 95   Hi 95
    Jan 3       34.79167 22.47636 47.10697 15.95703 53.6263
    Feb 3       34.79167 22.47636 47.10697 15.95703 53.6263
    
    [[2]]
          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
    Jan 3       34.58333 20.32802 48.83865 12.78171 56.38496
    Feb 3       34.58333 20.32802 48.83865 12.78171 56.38496
    

    As mentioned initially the ts() function works based on frequency not a date column, meaning you have to secure that months with no sales are listed and that all articles have a complete data time line, increasingly ordered (time oriented). Missing values have to be included before forming the time-series object.

    Finally I highly recommend the open book from the author of the forecast package, which can be found here: https://otexts.com/fpp2/