My problem is easy to explain :
Table with rows where dates follow
I tried to use for loops that compare the two vectors of dates (vectors extracted from the columns) but it does not really work...
I tried something like this to identify start date and end date :
'''
a = sort(data$Date_debut)
b = sort(data$Date_fin)
for(i in 1:(length(a)-1)){
for(j in 2:length(a)){
datedeb = a[j-1]
if(b[i]+1 == a[j]){
while(b[i]+1 == a[j] ){
datefin = b[i+1]
i = i+1}
}
}
}
''' datedeb = start date datefin = end date
Thank you for your help, I am open to ideas / ways to deal with this.
Here is one approach using tidyverse
. For each Var1
group, create subgroups containing an index based on when the start date does not equal the previous row end date (keeping those rows together with the same index). Then you can group_by
both the Var1
and the index together, and use the first
start date and last
end date as your date ranges.
library(tidyverse)
df %>%
group_by(Var1) %>%
mutate(i = cumsum(Start_date != lag(End_date, default = as.Date(-Inf)) + 1)) %>%
group_by(i, .add = T) %>%
summarise(Start_date = first(Start_date), End_date = last(End_date)) %>%
select(-i)
Output
Var1 Start_date End_date
<chr> <date> <date>
1 A 2019-01-02 2019-04-09
2 A 2019-10-11 2019-10-11
3 B 2019-12-03 2019-12-20
4 C 2019-12-29 2019-12-31
Data
df <- structure(list(Var1 = c("A", "A", "A", "A", "B", "C"), Start_date = structure(c(17898,
17962, 17993, 18180, 18233, 18259), class = "Date"), End_date = structure(c(17961,
17992, 17995, 18180, 18250, 18261), class = "Date")), class = "data.frame", row.names = c(NA,
-6L))