Here's the tweak to my previously posted question. Here's my data:
set.seed(3737)
DF2 = data.frame(user_id = c(rep(27, 7), rep(11, 7)),
date = as.Date(rep(c('2016-01-01', '2016-01-03', '2016-01-05', '2016-01-07', '2016-01-10', '2016-01-14', '2016-01-16'), 2)),
value = round(rnorm(14, 15, 5), 1))
user_id date value
27 2016-01-01 15.0
27 2016-01-03 22.4
27 2016-01-05 13.3
27 2016-01-07 21.9
27 2016-01-10 20.6
27 2016-01-14 18.6
27 2016-01-16 16.4
11 2016-01-01 6.8
11 2016-01-03 21.3
11 2016-01-05 19.8
11 2016-01-07 22.0
11 2016-01-10 19.4
11 2016-01-14 17.5
11 2016-01-16 19.3
This time, I'd like to calculate cumulative sum of a value
for each user_id
for the specified time period'; e.g. last 7, 14 days. The desirable solution would look like this:
user_id date value v_minus7 v_minus14
27 2016-01-01 15.0 15.0 15.0
27 2016-01-03 22.4 37.4 37.4
27 2016-01-05 13.3 50.7 50.7
27 2016-01-07 21.9 72.6 72.6
27 2016-01-10 20.6 78.2 93.2
27 2016-01-14 18.6 61.1 111.8
27 2016-01-16 16.4 55.6 113.2
11 2016-01-01 6.8 6.8 6.8
11 2016-01-03 21.3 28.1 28.1
11 2016-01-05 19.8 47.9 47.9
11 2016-01-07 22.0 69.9 69.9
11 2016-01-10 19.4 82.5 89.3
11 2016-01-14 17.5 58.9 106.8
11 2016-01-16 19.3 56.2 119.3
Ideally, I'd like to use dplyr
for this, but other packages would be fine.
logic : first group by user_id
, followed by date
. Now for each subset of data, we are checking which all dates lie between the current date and 7/14 days back using between()
which returns a logical vector.
Based on this logical vector I add the value
column
library(data.table)
setDT(DF2)[, `:=`(v_minus7 = sum(DF2$value[DF2$user_id == user_id][between(DF2$date[DF2$user_id == user_id], date-7, date, incbounds = TRUE)]),
v_minus14 = sum(DF2$value[DF2$user_id == user_id][between(DF2$date[DF2$user_id == user_id], date-14, date, incbounds = TRUE)])),
by = c("user_id", "date")][]
# user_id date value v_minus7 v_minus14
#1: 27 2016-01-01 15.0 15.0 15.0
#2: 27 2016-01-03 22.4 37.4 37.4
#3: 27 2016-01-05 13.3 50.7 50.7
#4: 27 2016-01-07 21.9 72.6 72.6
#5: 27 2016-01-10 20.6 78.2 93.2
#6: 27 2016-01-14 18.6 61.1 111.8
#7: 27 2016-01-16 16.4 55.6 113.2
#8: 11 2016-01-01 6.8 6.8 6.8
#9: 11 2016-01-03 21.3 28.1 28.1
#10: 11 2016-01-05 19.8 47.9 47.9
#11: 11 2016-01-07 22.0 69.9 69.9
#12: 11 2016-01-10 19.4 82.5 89.3
#13: 11 2016-01-14 17.5 58.9 106.8
#14: 11 2016-01-16 19.3 56.2 119.3
# from alexis_laz answer.
ff = function(date, value, minus){
cs = cumsum(value)
i = findInterval(date - minus, date, rightmost.closed = TRUE)
w = which(as.logical(i))
i[w] = cs[i[w]]
cs - i
}
setDT(DF2)
DF2[, `:=`( v_minus7 = ff(date, value, 7),
v_minus14 = ff(date, value, 14)), by = c("user_id")]