Search code examples
rdatedata.tablereshapelubridate

Expand quarterly rows to multiple monthly rows


I have a dataset with a column referring to date. The date is written in a quarterly manner (i.e. takes each 3 months as 1 unit). One column indicates whether the date is in fact quarterly or monthly taken. I want only to deal with quarterly types.

I would like to expand each quarterly row to 3 monthly rows. Another column, called "gains" will be affected too and shall be divided by "3" (i.e. the number of months in a quarter).

Example:

example <- data.frame(quarterly_reports = as.Date(as.character(c(20200331,20200630, 20200930,20201231, 20210131)), "%Y%m%d"),
                      type = c("q","q","q","q","m"),
                      gains = c(18000, 30000, 45000, 60000, 10000))

This is how it looks like:

  quarterly_reports type gains
1        2020-03-31    q 18000
2        2020-06-30    q 30000
3        2020-09-30    q 45000
4        2020-12-31    q 60000
5        2021-01-31    m 10000

What I want is an output similar to this (and note how I divided the gains column by 3):

   quarterly_reports type gains
1         2020-01-31    m  6000
2         2020-02-28    m  6000
3         2020-03-31    m  6000
4         2020-04-30    m 10000
5         2020-05-31    m 10000
6         2020-06-30    m 10000
7         2020-07-31    m 15000
8         2020-08-31    m 15000
9         2020-09-30    m 15000
10        2020-10-31    m 20000
11        2020-11-30    m 20000
12        2020-12-31    m 20000
13        2021-01-31    m 10000

Note: I'm using data.table, and I'm trying to utilize lubridate in converting the dates from quarterly to monthly. I have about 3 millions of these rows, so I'm looking for something fast and dirty.

Any reply is appreciated.


Solution

  • Update: based on @Henrik's suggestions, here is a much-more efficient alternative:

    library(lubdridate)
    newexample <- example[type == 'q', .(
      quarterly_reports = quarterly_reports %m-% months(rep(0:2, .N)),
      type = "m",
      gains = gains/3) ]
    setorder(newexample, quarterly_reports)
    newexample
    #     quarterly_reports   type gains
    #                <Date> <char> <num>
    #  1:        2020-01-31      m  6000
    #  2:        2020-02-29      m  6000
    #  3:        2020-03-31      m  6000
    #  4:        2020-04-30      m 10000
    #  5:        2020-05-30      m 10000
    #  6:        2020-06-30      m 10000
    #  7:        2020-07-30      m 15000
    #  8:        2020-08-30      m 15000
    #  9:        2020-09-30      m 15000
    # 10:        2020-10-31      m 20000
    # 11:        2020-11-30      m 20000
    # 12:        2020-12-31      m 20000
    

    (This is likely much slower, kept for posterity.)

    I don't know that this saves you a lot of efficiency in terms of data.table's normal referential semantics, since it has to create a lot of rows (which I think are not done in-place). Either way,

    library(zoo)
    library(data.table)
    setDT(example)
    
    newexample <- example[type == "q",][,rn:=seq_len(nrow(.SD))][, .(
      quarterly_reports = as.Date(seq(as.yearmon(quarterly_reports), length.out = 3, by = -1/12), frac = 1),
      type = rep("m", 3),
      gains = rep(gains, 3) / 3
    ), by = rn ][,rn:=NULL]
    newexample <- rbindlist(list(newexample, example[type != "q",]))
    setorder(newexample, "quarterly_reports")
    newexample[]
    #     quarterly_reports   type gains
    #                <Date> <char> <num>
    #  1:        2020-01-31      m  6000
    #  2:        2020-02-29      m  6000
    #  3:        2020-03-31      m  6000
    #  4:        2020-04-30      m 10000
    #  5:        2020-05-31      m 10000
    #  6:        2020-06-30      m 10000
    #  7:        2020-07-31      m 15000
    #  8:        2020-08-31      m 15000
    #  9:        2020-09-30      m 15000
    # 10:        2020-10-31      m 20000
    # 11:        2020-11-30      m 20000
    # 12:        2020-12-31      m 20000
    # 13:        2021-01-31      m 10000
    

    (The use of zoo::as.yearmon suggested by @G.Grothendieck previous answer, https://stackoverflow.com/a/47634838/3358272, in order to keep the dates as the last day of each month.)