I have a dataset with a column referring to date. The date is written in a quarterly manner (i.e. takes each 3 months as 1 unit). One column indicates whether the date is in fact quarterly or monthly taken. I want only to deal with quarterly types.
I would like to expand each quarterly row to 3 monthly rows. Another column, called "gains" will be affected too and shall be divided by "3" (i.e. the number of months in a quarter).
Example:
example <- data.frame(quarterly_reports = as.Date(as.character(c(20200331,20200630, 20200930,20201231, 20210131)), "%Y%m%d"),
type = c("q","q","q","q","m"),
gains = c(18000, 30000, 45000, 60000, 10000))
This is how it looks like:
quarterly_reports type gains
1 2020-03-31 q 18000
2 2020-06-30 q 30000
3 2020-09-30 q 45000
4 2020-12-31 q 60000
5 2021-01-31 m 10000
What I want is an output similar to this (and note how I divided the gains column by 3):
quarterly_reports type gains
1 2020-01-31 m 6000
2 2020-02-28 m 6000
3 2020-03-31 m 6000
4 2020-04-30 m 10000
5 2020-05-31 m 10000
6 2020-06-30 m 10000
7 2020-07-31 m 15000
8 2020-08-31 m 15000
9 2020-09-30 m 15000
10 2020-10-31 m 20000
11 2020-11-30 m 20000
12 2020-12-31 m 20000
13 2021-01-31 m 10000
Note: I'm using data.table
, and I'm trying to utilize lubridate
in converting the dates from quarterly to monthly. I have about 3 millions of these rows, so I'm looking for something fast and dirty.
Any reply is appreciated.
Update: based on @Henrik's suggestions, here is a much-more efficient alternative:
library(lubdridate)
newexample <- example[type == 'q', .(
quarterly_reports = quarterly_reports %m-% months(rep(0:2, .N)),
type = "m",
gains = gains/3) ]
setorder(newexample, quarterly_reports)
newexample
# quarterly_reports type gains
# <Date> <char> <num>
# 1: 2020-01-31 m 6000
# 2: 2020-02-29 m 6000
# 3: 2020-03-31 m 6000
# 4: 2020-04-30 m 10000
# 5: 2020-05-30 m 10000
# 6: 2020-06-30 m 10000
# 7: 2020-07-30 m 15000
# 8: 2020-08-30 m 15000
# 9: 2020-09-30 m 15000
# 10: 2020-10-31 m 20000
# 11: 2020-11-30 m 20000
# 12: 2020-12-31 m 20000
(This is likely much slower, kept for posterity.)
I don't know that this saves you a lot of efficiency in terms of data.table
's normal referential semantics, since it has to create a lot of rows (which I think are not done in-place). Either way,
library(zoo)
library(data.table)
setDT(example)
newexample <- example[type == "q",][,rn:=seq_len(nrow(.SD))][, .(
quarterly_reports = as.Date(seq(as.yearmon(quarterly_reports), length.out = 3, by = -1/12), frac = 1),
type = rep("m", 3),
gains = rep(gains, 3) / 3
), by = rn ][,rn:=NULL]
newexample <- rbindlist(list(newexample, example[type != "q",]))
setorder(newexample, "quarterly_reports")
newexample[]
# quarterly_reports type gains
# <Date> <char> <num>
# 1: 2020-01-31 m 6000
# 2: 2020-02-29 m 6000
# 3: 2020-03-31 m 6000
# 4: 2020-04-30 m 10000
# 5: 2020-05-31 m 10000
# 6: 2020-06-30 m 10000
# 7: 2020-07-31 m 15000
# 8: 2020-08-31 m 15000
# 9: 2020-09-30 m 15000
# 10: 2020-10-31 m 20000
# 11: 2020-11-30 m 20000
# 12: 2020-12-31 m 20000
# 13: 2021-01-31 m 10000
(The use of zoo::as.yearmon
suggested by @G.Grothendieck previous answer, https://stackoverflow.com/a/47634838/3358272, in order to keep the dates as the last day of each month.)