Problem statement:
You are given with the following data:
To find:
Expected output (the last column):
customer_id product purchase_time total_to_date time_from_last_purchase
1 A 2014-11-24 1 0
1 A 2018-02-21 2 1185
1 E 2014-01-08 1 0
2 J 2016-04-18 1 0
3 F 2017-06-12 1 0
3 G 2017-06-23 1 0
4 F 2017-09-27 1 0
4 F 2018-01-08 2 103
4 F 2018-02-08 3 31
4 F 2018-02-09 4 1
4 F 2018-04-10 5 60
My approach:
I am quite new to R, so any help is greatly appreciated. Thank you!
With dplyr
, you can try:
df %>%
group_by(customer_id, product) %>%
mutate(purchase_time = as.Date(purchase_time, format = "%Y-%m-%d"),
res = purchase_time - lag(purchase_time, default = first(purchase_time)))
customer_id product purchase_time total_to_date res
<int> <chr> <date> <int> <time>
1 1 A 2014-11-24 1 0 days
2 1 A 2018-02-21 2 1185 days
3 1 E 2014-01-08 1 0 days
4 2 J 2016-04-18 1 0 days
5 3 F 2017-06-12 1 0 days
6 3 G 2017-06-23 1 0 days
7 4 F 2017-09-27 1 0 days
8 4 F 2018-01-08 2 103 days
9 4 F 2018-02-08 3 31 days
10 4 F 2018-02-09 4 1 days
11 4 F 2018-04-10 5 60 days
Or if you need the results as a numeric variable:
df %>%
group_by(customer_id, product) %>%
mutate(purchase_time = as.Date(purchase_time, format = "%Y-%m-%d"),
res = as.numeric(purchase_time - lag(purchase_time, default = first(purchase_time))))
customer_id product purchase_time total_to_date res
<int> <chr> <date> <int> <dbl>
1 1 A 2014-11-24 1 0
2 1 A 2018-02-21 2 1185
3 1 E 2014-01-08 1 0
4 2 J 2016-04-18 1 0
5 3 F 2017-06-12 1 0
6 3 G 2017-06-23 1 0
7 4 F 2017-09-27 1 0
8 4 F 2018-01-08 2 103
9 4 F 2018-02-08 3 31
10 4 F 2018-02-09 4 1
11 4 F 2018-04-10 5 60