I have a dataframe similar than this following random data:
data<- data.frame(
id = c('TEC', 'TEC', 'TEC', 'TEC', 'FIL', 'FIL', 'FIL'),
date = c('2009-10', '2009-11', '2009-12', '2010-04', '2000-05', '2006-10', '2006-11')
)
data$date <- ym(data$date)
set.seed(123)
data$re <- sample(1:100, nrow(data), replace = TRUE)
Now, i need to create a new column in which we have to put numbers based on groups. I mean, first, i need to group by id. Then, in each id i need groups based on consecutives dates. For example, in this example, the first id=TEC has first date in 2009-10, the second date is consecutive (month by month) 2009-11, the third is also consecutive, however, the fourth date is not consecutive. So i need to put the firs three observations into group 1 and the fourth into group 2. Then, with the next id (FIL). The first observation needs to be in group 1, and the second date and third date are consecutives, so in group 2.
I need to generalize this for a huge dataframe, but with the same format.
The final output (in this random sample) needs to be like:
Thanks!
Here is a dplyr
way:
library(dplyr)
data %>%
group_by(id) %>%
mutate(is_consecutive = (date == lag(date) %m+% months(1))) %>%
mutate(group = cumsum(!is_consecutive == TRUE | is.na(is_consecutive)), .keep = "unused") %>%
ungroup()
id date re group
<chr> <date> <int> <int>
1 TEC 2009-10-01 31 1
2 TEC 2009-11-01 79 1
3 TEC 2009-12-01 51 1
4 TEC 2010-04-01 14 2
5 FIL 2000-05-01 67 1
6 FIL 2006-10-01 42 2
7 FIL 2006-11-01 50 2