Search code examples
rdataframetime-serieshierarchical

r gathering multivariate time series


I have a data frame I want to convert to a time series. Problem is for each date I have multiple products. It looks like

    Name_Article Week Num_Any Spending Unit_Price
1      Product_A   1    2016   196.05     3.376000
2      Product_B   1    2016   377.04     1.004867
3      Product_A   2    2016  2979.40     3.376000
4      Product_C   2    2016   353.44     3.034444
5      Product_D   2    2016   160.99     0.653621
6      Product_E   2    2016   950.22     1.441164
7      Product_A   3    2016   196.05     3.376000
8      Product_B   3    2016   377.04     1.004867
9      Product_D   3    2016  2979.40     0.653621
10     Product_E   3    2016   353.44     1.441164
11     Product_A   4    2016   160.99     3.376000
12     Product_B   4    2016   950.22     1.441164

I know that working with weekly time series is not the best option, but I have no choice. My idea is to get something like

  Week Spending.A UnitPrice.A Spending.B UnitPrice.B Spending.C UnitPrice.C ...
    1      196.05    3.376000     377.04    1.004867        0.00   3.034444
    2     2979.40    3.376000       0.00    1.004867      353.44   3.034444
    3      120.05    3.376000     377.04    1.004867        0.00   3.950000
    4      160.99    3.500000     950.22    1.441164    ...

I can't get my head around the tydir functions gather() and spread(). Any help will be appreciated!

In case you were wondering, the goal of all this is to preform a hierarchical forecasting , but before I begin I need my data to be structurated.

Many thanks!


Solution

  • Remove everything up to the underscore in Name_Article and drop the year column. Then read it as a zoo object splitting by Name_Article and specifying the Week is the index.

    If you need a specific form then various conversions are possible such as as.ts(z), fortify.zoo(z), coredata(z), index(z) and zoo(coredata(z)).

    library(zoo)
    
    DF2 <- transform(DF, Name_Article = sub(".*_", "", Name_Article))[-3]
    z <- read.zoo(DF2, index = "Week", split = "Name_Article", FUN = identity)
    

    giving:

    > z
      Spending.A Unit_Price.A Spending.B Unit_Price.B Spending.C Unit_Price.C
    1     196.05        3.376     377.04     1.004867         NA           NA
    2    2979.40        3.376         NA           NA     353.44     3.034444
    3     196.05        3.376     377.04     1.004867         NA           NA
    4     160.99        3.376     950.22     1.441164         NA           NA
      Spending.D Unit_Price.D Spending.E Unit_Price.E
    1         NA           NA         NA           NA
    2     160.99     0.653621     950.22     1.441164
    3    2979.40     0.653621     353.44     1.441164
    4         NA           NA         NA           NA
    

    Note

    The data in reproducible form is assumed to be:

    Lines <- "
        Name_Article Week Num_Any Spending Unit_Price
    1      Product_A   1    2016   196.05     3.376000
    2      Product_B   1    2016   377.04     1.004867
    3      Product_A   2    2016  2979.40     3.376000
    4      Product_C   2    2016   353.44     3.034444
    5      Product_D   2    2016   160.99     0.653621
    6      Product_E   2    2016   950.22     1.441164
    7      Product_A   3    2016   196.05     3.376000
    8      Product_B   3    2016   377.04     1.004867
    9      Product_D   3    2016  2979.40     0.653621
    10     Product_E   3    2016   353.44     1.441164
    11     Product_A   4    2016   160.99     3.376000
    12     Product_B   4    2016   950.22     1.441164"
    DF <- read.table(text = Lines, header = TRUE)