Search code examples
rdatasettime-seriesdata-miningforecasting

forecasting multiples products data


I would like to predict the next 5 orders and the quantity of the 3 products in each order.

I am a beginner using r and timeseries and I saw examples using arima but they're applied only to measure one thing and not multiple products like in my example.

Should I use arima? What should I do exactly?

Sorry for my bad English. Thank you in advance.

dateordrer,product1,product2,product3
12/01/2012,2565,3254,635
25/01/2012,2270,3254,670
01/03/2012,2000,785,0
05/05/2012,300,3254,750
26/06/2012,3340,0,540
30/06/2012,0,3254,0
21/06/2012,3360,3356,830
01/07/2012,2470,3456,884
03/07/2012,3680,3554,944
05/07/2012,2817,3854,0
09/07/2012,4210,4254,32
09/08/2012,0,3254,1108
13/09/2012,4560,5210,952
25/09/2012,4452,4256,1143
31/09/2012,5090,5469,199
25/11/2012,5100,5569,0
10/12/2012,5440,5789,1323
11/12/2012,5528,5426,1350

Solution

  • Your question is very broad, so it can only be answered in a broad manner. Also, the question has more to do with forecasting theory than with R. I will give you two pointers to get you started...

    1. It seems you have some pre-processing to do, i.e.: what are your time intervals? what is your basic time unit? (week? month?). You should aggregate the data according to that time unit. For these kind of operations you can use the tidyr and lubridate packages. Here's an example of your data set after I arranged it a bit:

      data.raw <- read_csv("data1.csv") %>%
                  mutate(date.re = as.POSIXct(dateordrer, format = "%d/%m/%Y"))
      complete.dates <- range(data.raw$date.re)
      dates.seq <- seq(complete.dates[1], complete.dates[2], by = "month")
      

      series <- data.frame(sale.month = month(dates.seq), sale.year = year(dates.seq))

      data.post <- data.raw %>%
                   mutate(sale.month = month(date.re), sale.year = year(date.re)) %>%
                   select(product1:product3, sale.month, sale.year) %>%
                   group_by(sale.month, sale.year) %>%
                   summarize_all(funs(sum(.))) %>%
                   right_join(series) %>%
                   replace_na(list(product1 = 0, product2 = 0, product3 = 0))
      

    It would look like this:

            sale.month  sale.year   product1    product2    product3
            1           2012        4835        6508        1305
            2           2012        0           0           0
            3           2012        2000        785         0
            4           2012        0           0           0
            etc...
    

    See that for months 2 and 4 you had no data (originally), therefore they appear as 0s. Note that pre-processing is not to be taken lightly, I used months as the basic unit, but that might not be true or relevant to your goals. You might even revise this after you continue and try to see if different aggregation gives better results.

    1. Only after preprocessing you can turn to forecasting. If the three product are independent, they can be predicted independently (e.g. use Arima / Holt-Winters / any other model * three times). However, The fact that you have three products which might be correlated to each other, directs us to hierarchical time series (package hts). The function hts() within this package is able to best-fit forecasting models when there is a statistical relationship between the various products. For example, when a certain product is purchased with another (complementing products) or when you are out-of-stock and that leads to a different product (alternative product).

    Since this is far from being self-contained for such a broad topic, the next best move for you is to check out the following online book:

    Forecasting: principles and practice

    By Hyndman and Athanasopoulos. I read it when I started with time series. It's a very good book. Specifically, for multiple time series you should cover chapter:

    9.4 Forecasting hierarchical or grouped time series

    Make sure you also read chapter 7 at that book (before moving to 9.4).