I have a very large data set that I would like to aggregate both by week/month and by product(a few thousand). Is there a way to do so with a data set in the following format?
Date product product2 product3
03/03/2011 1 0 7
04/08/2011 3 8 2
03/05/2015 6 3 89
03/01/2017 1 0 2
03/03/2017 6 1 6
which would yield the following:
Date product product2 product3
wk1-032011 1 0 7
wk2-042011 3 8 2
wk1-032015 6 3 89
wk1-032017 7 1 8
df <- structure(list(Date = c("03/03/2011", "04/04/2011", "03/05/2015", "03/01/2017", "03/03/2017"),
product= c(1L, 3L, 6L, 1L, 6L),
product2= c(0L, 8L, 3L, 0L, 1L),
product3= c(7L, 2L, 89L, 2L, 6L)),
.Names= c("Date", "product", "product2", "product3"),
class= "data.frame", row.names=c(NA, -5L))
In base R, you can use as.Date
to convert your character df$Date into a Date variable and then use format
to with the proper formatting to convert the date into a character variable indicating weekly dates. aggregate
is then used to perform the aggregation by the new variable.
aggregate(df[2:4], list("weeks"=format(as.Date(df$Date, "%m/%d/%Y"), "%Y-%W")), FUN=sum)
weeks product product2 product3
1 2011-09 1 0 7
2 2011-14 3 8 2
3 2015-09 6 3 89
4 2017-09 7 1 8
See ?strptime
for other date conversions.
As @akrun mentions in the comments, the data.table
analog to the above base R code is
library(data.table)
setDT(df)[, lapply(.SD, sum),
by=.(weeks = format(as.IDate(Date, "%m/%d/%Y"), "%Y-%W"))]
Here, setDT
converts the data.frame into a data.table, lapply...
calculates the sum where .SD
stands for the data.table. This sum is calculated by each unique element that is produced from format(as.IDate(Date, "%m/%d/%Y"), "%Y-%W")
where the conversion uses data.table's as.IDate
in place of the base R as.Date
.