Search code examples
rdatematrixrow

Deal with following dates in a table in R


My problem is easy to explain :

  • I have one table with start dates and end dates and n rows ordered by "start date" (see image bellow - Yellow rows are the ones I want to have on one unique row with first start date and last end date)

Table with rows where dates follow

  • I would like to regroup dates on one row when start date n+1 == end date n. Here is an exemple of what I need as a reslut (image below)

Result i need

I tried to use for loops that compare the two vectors of dates (vectors extracted from the columns) but it does not really work...

I tried something like this to identify start date and end date :

'''

a = sort(data$Date_debut)
b = sort(data$Date_fin)

for(i in 1:(length(a)-1)){
   for(j in 2:length(a)){
      datedeb = a[j-1]
      if(b[i]+1 == a[j]){
       while(b[i]+1 == a[j] ){
          datefin = b[i+1]
          i = i+1}
      
    }
 }
}

''' datedeb = start date datefin = end date

Thank you for your help, I am open to ideas / ways to deal with this.


Solution

  • Here is one approach using tidyverse. For each Var1 group, create subgroups containing an index based on when the start date does not equal the previous row end date (keeping those rows together with the same index). Then you can group_by both the Var1 and the index together, and use the first start date and last end date as your date ranges.

    library(tidyverse)
    
    df %>%
      group_by(Var1) %>%
      mutate(i = cumsum(Start_date != lag(End_date, default = as.Date(-Inf)) + 1)) %>%
      group_by(i, .add = T) %>%
      summarise(Start_date = first(Start_date), End_date = last(End_date)) %>%
      select(-i)
    

    Output

      Var1  Start_date End_date  
      <chr> <date>     <date>    
    1 A     2019-01-02 2019-04-09
    2 A     2019-10-11 2019-10-11
    3 B     2019-12-03 2019-12-20
    4 C     2019-12-29 2019-12-31
    

    Data

    df <- structure(list(Var1 = c("A", "A", "A", "A", "B", "C"), Start_date = structure(c(17898, 
    17962, 17993, 18180, 18233, 18259), class = "Date"), End_date = structure(c(17961, 
    17992, 17995, 18180, 18250, 18261), class = "Date")), class = "data.frame", row.names = c(NA, 
    -6L))