I have a data frame I want to convert to a time series. Problem is for each date I have multiple products. It looks like
Name_Article Week Num_Any Spending Unit_Price
1 Product_A 1 2016 196.05 3.376000
2 Product_B 1 2016 377.04 1.004867
3 Product_A 2 2016 2979.40 3.376000
4 Product_C 2 2016 353.44 3.034444
5 Product_D 2 2016 160.99 0.653621
6 Product_E 2 2016 950.22 1.441164
7 Product_A 3 2016 196.05 3.376000
8 Product_B 3 2016 377.04 1.004867
9 Product_D 3 2016 2979.40 0.653621
10 Product_E 3 2016 353.44 1.441164
11 Product_A 4 2016 160.99 3.376000
12 Product_B 4 2016 950.22 1.441164
I know that working with weekly time series is not the best option, but I have no choice. My idea is to get something like
Week Spending.A UnitPrice.A Spending.B UnitPrice.B Spending.C UnitPrice.C ...
1 196.05 3.376000 377.04 1.004867 0.00 3.034444
2 2979.40 3.376000 0.00 1.004867 353.44 3.034444
3 120.05 3.376000 377.04 1.004867 0.00 3.950000
4 160.99 3.500000 950.22 1.441164 ...
I can't get my head around the tydir functions gather()
and spread()
. Any help will be appreciated!
In case you were wondering, the goal of all this is to preform a hierarchical forecasting , but before I begin I need my data to be structurated.
Many thanks!
Remove everything up to the underscore in Name_Article
and drop the year column. Then read it as a zoo object splitting by Name_Article
and specifying the Week
is the index.
If you need a specific form then various conversions are possible such as as.ts(z)
, fortify.zoo(z)
, coredata(z)
, index(z)
and zoo(coredata(z))
.
library(zoo)
DF2 <- transform(DF, Name_Article = sub(".*_", "", Name_Article))[-3]
z <- read.zoo(DF2, index = "Week", split = "Name_Article", FUN = identity)
giving:
> z
Spending.A Unit_Price.A Spending.B Unit_Price.B Spending.C Unit_Price.C
1 196.05 3.376 377.04 1.004867 NA NA
2 2979.40 3.376 NA NA 353.44 3.034444
3 196.05 3.376 377.04 1.004867 NA NA
4 160.99 3.376 950.22 1.441164 NA NA
Spending.D Unit_Price.D Spending.E Unit_Price.E
1 NA NA NA NA
2 160.99 0.653621 950.22 1.441164
3 2979.40 0.653621 353.44 1.441164
4 NA NA NA NA
The data in reproducible form is assumed to be:
Lines <- "
Name_Article Week Num_Any Spending Unit_Price
1 Product_A 1 2016 196.05 3.376000
2 Product_B 1 2016 377.04 1.004867
3 Product_A 2 2016 2979.40 3.376000
4 Product_C 2 2016 353.44 3.034444
5 Product_D 2 2016 160.99 0.653621
6 Product_E 2 2016 950.22 1.441164
7 Product_A 3 2016 196.05 3.376000
8 Product_B 3 2016 377.04 1.004867
9 Product_D 3 2016 2979.40 0.653621
10 Product_E 3 2016 353.44 1.441164
11 Product_A 4 2016 160.99 3.376000
12 Product_B 4 2016 950.22 1.441164"
DF <- read.table(text = Lines, header = TRUE)