Search code examples
rdataframedata.tablecumsum

R - Cumsum of product over rolling windows (quarters)


I want to calculate the cumulative sum of the column value with a special requirement. After a quarter (= 3 months) has passed, the value is depreciated by a constant, say 0.9. That means, it now factors into the cumsum only by 90% of its original value. When another quarter has passed, this is repeated, i.e. the 90% are multiplied by the constant (0.9) again (this equals the original value * 0.9^2). For every quarter passing, this procedure is repeated so after a year 0.9^4 of the original value run into the cumsum, after two years, 0.9^8, and so on. The full original value only factors fully into the cumsum in the first quarter following its date variable.

Example: a value on 01/15 of 100 would add to the cumsum with 100 until 04/15, with 100*0.9=90 between 04/15 and 07/15, with 100*0.9^2=81 between 07/15 and 10/15, and so on, see desired_outputcolumn of the MRE.

MRE:

df <- tribble(~date, ~value,~country,~desired_output,
              "2017-01-01", 2, "US", 2,
              "2017-01-05", 2, "UK", 8,
              "2017-01-05", 4, "US", 8,
              "2017-04-01", 5, "IT", 12.8,  # only the first observation is older than a quarter so calculate as (2*0.9+2+4+5)
              "2017-04-03", 3, "US", 115.8, #(2*0.9+2+4+5+3+100)
              "2017-04-03", 100, "US", 115.8,
              "2017-04-11", 20, "UK", 135.2,  # now the first three observations are older than a quarter (8*0.9+5+3+100+20)
              "2017-04-15", 6, "US", 141.2,
              "2017-07-02", 30, "US", 170.52,  # now the first observation is older than two quarters and the second, third, and forth observation are older than a quarter (2*0.9^2+11*0.9+3+100+20+6+30)
              "2017-10-12", 4, "UK", 151.912,  # ((2+2+4)*0.9^3+(5+3+100+20)*0.9^2+(6+30)*0.9+4+6)
              "2017-10-12", 6, "IT", 151.912) # (8*0.9^3+128*0.9^2+36*0.9+10)

assuming the constant has a value of 0.9 and the format of the date is %Y-%b-%d.


Pseudo-Code to further clarify the questions:

  1. Take cumsum of each quarter/3 month period (from the date of current row)

  2. Exponentiate constant by distance of this period to the current row date (e.g. C^2 if quarter is two quarters back from today)

  3. Multiply this adjusted constant by the cumsum of the respective quarter (3-month period)

  4. Take cumsum of all quarter cumsums


In a next step, I would want to take this specific cumsum also by a grouping variable (e.g. country).


Solution

  • Here is an option:

    DT[, do := 
        .SD[.SD, on=.(date<=date), by=.EACHI, {
            nqtr <- floor(pmax(0, i.date - x.date) / 90)
            sum(value * 0.9^nqtr)
        }]$V1
    ]
    

    output:

              date value country desired_output      do
     1: 2017-01-01     2      US          2.000   2.000
     2: 2017-01-05     2      UK          8.000   8.000
     3: 2017-01-05     4      US          8.000   8.000
     4: 2017-04-01     5      IT         12.800  12.800
     5: 2017-04-03     3      US        115.800 115.800
     6: 2017-04-03   100      US        115.800 115.800
     7: 2017-04-11    20      UK        135.200 135.200
     8: 2017-04-15     6      US        141.200 141.200
     9: 2017-07-02    30      US        170.520 160.220
    10: 2017-10-12     4      UK        151.912 151.372
    11: 2017-10-12     6      IT        151.912 151.372
    

    The difference is in how we define a quarter. I used 90d. If 3m is really important, I will update the post. For example on 2017-07-02, rows 2 - 6 are 1 qtr ago when using 90days whereas in your OP, only rows 2 - 4 are in a qtr ago when using 3m.

    data:

    library(data.table)    
    DT <- fread('date,value,country,desired_output
    "2017-01-01", 2, "US", 2
    "2017-01-05", 2, "UK", 8
    "2017-01-05", 4, "US", 8
    "2017-04-01", 5, "IT", 12.8  
    "2017-04-03", 3, "US", 115.8 
    "2017-04-03", 100, "US", 115.8
    "2017-04-11", 20, "UK", 135.2
    "2017-04-15", 6, "US", 141.2
    "2017-07-02", 30, "US", 170.52
    "2017-10-12", 4, "UK", 151.912
    "2017-10-12", 6, "IT", 151.912')
    DT[, date := as.IDate(date, format="%Y-%m-%d")]
    

    Handling 3m and country requirement:

    DT[, do := 
        .SD[.SD, on=.(country, date<=date), by=.EACHI, {
            vec <- rev(seq(i.date, min(x.date)-93L, by="-1 quarter"))
            itvl <- findInterval(x.date, vec, rightmost.closed=TRUE)
            nqtr <- length(vec) - itvl - 1L
            sum(value * 0.9^nqtr)
        }]$V1
    ]
    

    output:

              date value country desired_output      do
     1: 2017-01-01     2      US          2.000   2.000
     2: 2017-01-05     2      UK          8.000   8.000
     3: 2017-01-05     4      US          8.000   8.000
     4: 2017-04-01     5      IT         12.800  13.000
     5: 2017-04-03     3      US        115.800 115.800
     6: 2017-04-03   100      US        115.800 115.800
     7: 2017-04-11    20      UK        135.200 135.200
     8: 2017-04-15     6      US        141.200 141.200
     9: 2017-07-02    30      US        170.520 170.520
    10: 2017-10-12     4      UK        151.912 151.912
    11: 2017-10-12     6      IT        151.912 151.912