I have 10000 id's, which all have a different number of purchase dates (with a minimum of 1). Per customer id, I want to find out if the number of days between the purchase dates exceeds 183 days (which is half a year). I only need a yes or no (binary) answer. The code I have for now is the following, however it does not work
for (id in 10000){
for (i in length(purchase_date[id])){
for (j in length(purchase_date[id])-1){
if (as.numeric(purchase_date[i]-purchase_date[j])>=183){
above_half_year <- 1
}else{
above_half_year <- 0
}
}
}
}
The top of the dataset is as follows
ID purchase_date
1 1 2014-01-13
2 1 2014-04-14
3 1 2014-08-13
4 1 2014-09-12
5 1 2014-11-12
6 1 2015-02-13
7 1 2017-02-14
8 1 2018-12-13
9 1 2019-04-15
10 2 2016-03-01
11 3 2016-06-13
12 3 2016-09-20
13 3 2016-10-20
14 3 2016-11-21
15 3 2016-12-20
The expected output would be
ID purchase_date above_half_year
1 1 2014-01-13 0
2 1 2014-04-14 0
3 1 2014-08-13 0
4 1 2014-09-12 0
5 1 2014-11-12 0
6 1 2015-02-13 0
7 1 2017-02-14 1
8 1 2018-12-13 1
9 1 2019-04-15 0
10 2 2016-03-01 0
11 3 2016-06-13 0
12 3 2016-09-20 0
13 3 2016-10-20 0
14 3 2016-11-21 0
15 3 2016-12-20 0
Thank you in advance for your help! I look forward to your suggestions
This is an approach with dplyr
df %>%
group_by(ID) %>%
mutate(above_half_year = if_else(purchase_date - lag(purchase_date) > 183, 1, 0))