I observe the number of purchases of different customers on a few days (1 = purchase, 0 = no purchase). Now for each day, I want to sum up the number of purchases of a specific customer during the preceding 2 days and the current day, so for 3 days in total.
Example data:
da <- data.frame(customer_id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
day = c("2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15"),
purchase = c(1,1,1,0,1,0,0,1,1,0,1,1,1,1,0,1,0,1,0,1))
> da
customer_id day purchase
1 1 2016-04-11 1
2 1 2016-04-12 1
3 1 2016-04-13 1
4 1 2016-04-14 0
5 1 2016-04-15 1
6 2 2016-04-11 0
7 2 2016-04-12 0
8 2 2016-04-13 1
9 2 2016-04-14 1
10 2 2016-04-15 0
11 3 2016-04-11 1
12 3 2016-04-12 1
13 3 2016-04-13 1
14 3 2016-04-14 1
15 3 2016-04-15 0
16 4 2016-04-11 1
17 4 2016-04-12 0
18 4 2016-04-13 1
19 4 2016-04-14 0
20 4 2016-04-15 1
The output I am looking for:
output_da <- data.frame(customer_id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
day = c("2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15"),
purchase = c(1,1,1,0,1,0,0,1,1,0,1,1,1,1,0,1,0,1,0,1),
purchases_last_3_days = c(1,2,3,2,2,0,0,1,2,2,1,2,3,3,2,1,1,2,1,2))
output_da
customer_id day purchase purchases_last_3_days
1 1 2016-04-11 1 1
2 1 2016-04-12 1 2
3 1 2016-04-13 1 3
4 1 2016-04-14 0 2
5 1 2016-04-15 1 2
6 2 2016-04-11 0 0
7 2 2016-04-12 0 0
8 2 2016-04-13 1 1
9 2 2016-04-14 1 2
10 2 2016-04-15 0 2
11 3 2016-04-11 1 1
12 3 2016-04-12 1 2
13 3 2016-04-13 1 3
14 3 2016-04-14 1 3
15 3 2016-04-15 0 2
16 4 2016-04-11 1 1
17 4 2016-04-12 0 1
18 4 2016-04-13 1 2
19 4 2016-04-14 0 1
20 4 2016-04-15 1 2
I know about the cumsum function, but I don't know how to select the X number of rows precending every current day.
If you need only the last 3 rows, you can do it manually:
library(dplyr)
output_da <- da %>%
group_by(customer_id) %>%
mutate(pday = lag(purchase, default = 0),
apday = lag(purchase, n = 2, default = 0),
purchases_last_3_days = rowSums(across(c(purchase, pday, apday))),
pday = NULL, apday = NULL)
UPDATE
And if you need more than the last 3 rows, it is possible to automatize a bit, changing the 3 by the suitable number:
library(dplyr)
library(purrr)
last_day_rows <- 3
da %>%
group_by(customer_id) %>%
mutate(purchases_last_3_days =
rowSums(map_dfc(c(1:last_day_rows),
~lag(purchase, n = .x - 1, default = 0))))