Search code examples
rdatedata.tabletidyverselubridate

How to determine difference in days between two dates across two columns and two rows by group?


I am looking to determine the difference in days by groups across two columns and two rows. Essentially subtract from the End Day by the subsequent Start Day in the subsequent row and record the difference as new column in the data frame and start over when a new group (ID) has been identified.

Start_Date   End_Date     ID   
  
2014-05-09   2015-05-08   01
2015-05-09   2016-05-08   01 
2016-05-11   2017-05-10   01
2017-05-11   2018-05-10   01
2016-08-29   2017-08-28   02
2017-08-29   2018-08-28   02

The result should be something like table below.

Start_Date   End_Date     ID   Days_Difference 
  
2014-05-09   2015-05-08   01         NA
2015-05-09   2016-05-08   01         01
2016-05-11   2017-05-10   01         03
2017-05-11   2018-05-10   01         01
2016-08-29   2017-08-28   02         NA
2017-08-29   2018-08-28   02         01

Essentially I want to take the difference of the End Date and its left diagonal Start date across groups (ID). I am having a really hard time with this one. I don't think my code would be helpful. Any solution using tidyverse, data.table, or base R would be greatly appreciated!


Solution

  • We may get the difference between the lead (next element) of 'Start_Date' and 'End_Date' after grouping

    library(dplyr)
    df1 <- df1 %>%
       mutate(across(ends_with("Date"), as.Date)) %>%
       group_by(ID) %>% 
       mutate(Days_Difference = as.numeric(lag(lead(Start_Date) - End_Date))) %>% 
       ungroup
    

    -output

    df1
    # A tibble: 6 × 4
      Start_Date End_Date      ID Days_Difference
      <date>     <date>     <int>           <dbl>
    1 2014-05-09 2015-05-08     1              NA
    2 2015-05-09 2016-05-08     1               1
    3 2016-05-11 2017-05-10     1               3
    4 2017-05-11 2018-05-10     1               1
    5 2016-08-29 2017-08-28     2              NA
    6 2017-08-29 2018-08-28     2               1
    

    Or a similar logic with data.table

    library(data.table)
    setDT(df1)[, Days_Difference := 
        as.numeric(shift(shift(as.IDate(Start_Date), type = "lead") - 
           as.IDate(End_Date))), ID]
    

    -output

    > df1
       Start_Date   End_Date    ID Days_Difference
           <char>     <char> <int>           <num>
    1: 2014-05-09 2015-05-08     1              NA
    2: 2015-05-09 2016-05-08     1               1
    3: 2016-05-11 2017-05-10     1               3
    4: 2017-05-11 2018-05-10     1               1
    5: 2016-08-29 2017-08-28     2              NA
    6: 2017-08-29 2018-08-28     2               1
    

    data

    df1 <- structure(list(Start_Date = c("2014-05-09", "2015-05-09", "2016-05-11", 
    "2017-05-11", "2016-08-29", "2017-08-29"), End_Date = c("2015-05-08", 
    "2016-05-08", "2017-05-10", "2018-05-10", "2017-08-28", "2018-08-28"
    ), ID = c(1L, 1L, 1L, 1L, 2L, 2L)), class = "data.frame", 
    row.names = c(NA, 
    -6L))