I'm dealing with dates and I wanted to group some rows together but I can't find how. In my data, one row is an individual in a time interval and in a place. Something like that :
ind place start_date end_date
<int> <int> <date> <date>
1 1 1 2011-01-04 2011-01-05
2 1 1 2011-01-05 2011-01-06
3 1 1 2011-01-10 2011-01-11
4 1 4 2010-12-30 2010-12-31
5 1 4 2010-12-31 2011-01-01
6 1 4 2011-01-01 2011-01-03
7 2 2 2018-02-17 2018-02-19
8 2 2 2018-02-19 2018-02-23
9 3 3 2018-02-08 2018-02-13
10 3 3 2018-02-13 2018-02-16
11 3 3 2018-02-16 2018-02-20
12 3 3 2018-03-27 2018-03-29
As you can see, sometimes, start_date
is the same as in previous end_date
. I can group_by
ind
and place
in order to merge time intervals, but the problem is that there are some rows where there is a gap between start_date
and previous end_date
. So I need to divide these groups in multiple other groups before merging.
What I want to obtain first is :
ind place start_date end_date group
<int> <int> <date> <date> <int>
1 1 1 2011-01-04 2011-01-05 1
2 1 1 2011-01-05 2011-01-06 1
3 1 1 2011-01-10 2011-01-11 2
4 1 4 2010-12-30 2010-12-31 3
5 1 4 2010-12-31 2011-01-01 3
6 1 4 2011-01-01 2011-01-03 3
7 2 2 2018-02-17 2018-02-19 4
8 2 2 2018-02-19 2018-02-23 4
9 3 3 2018-02-08 2018-02-13 5
10 3 3 2018-02-13 2018-02-16 5
11 3 3 2018-02-16 2018-02-20 5
12 3 3 2018-03-27 2018-03-29 6
And at the end :
ind place start_date end_date
<chr> <chr> <date> <date>
1 1 1 2011-01-04 2011-01-06
2 1 1 2011-01-10 2011-01-11
3 1 4 2010-12-30 2011-01-03
4 2 2 2018-02-17 2018-02-23
5 3 3 2018-02-08 2018-02-20
6 3 3 2018-03-27 2018-03-29
Do you have an idea of how can I do it ?
Thanks !
One way is to add an additional grouping variable that checks if end_date is less then start_date
library(dplyr)
df %>%
group_by(grp = cumsum(lag(end_date, default=end_date[1]) < start_date),
ind, place) %>%
summarize(start_date = first(start_date),
end_date = last(end_date), .groups="drop") %>%
select(-grp)
# A tibble: 6 × 4
ind place start_date end_date
<int> <int> <date> <date>
1 1 1 2011-01-04 2011-01-06
2 1 1 2011-01-10 2011-01-11
3 1 4 2010-12-30 2011-01-03
4 2 2 2018-02-17 2018-02-23
5 3 3 2018-02-08 2018-02-20
6 3 3 2018-03-27 2018-03-29