Search code examples
rdataframetimeoverlap

How to compute overlapping time intervals between groups


I have a dataframe with group id, start time, and end time. I would like to compute overlapped time interval between groups. Here is an example of the dataset;

id <- c("a","a","b","c","c")
start_time <-as.POSIXct(c("2016-05-27 09:30:00","2016-05-27 15:30:00",
                          "2016-05-27 14:30:00","2016-05-27 09:40:00","2016-05-27 15:00:00"),tz= "UTC")
end_time <-as.POSIXct(c("2016-05-27 10:30:00","2016-05-27 17:30:00",
                        "2016-05-27 16:30:00","2016-05-27 09:50:00","2016-05-27 16:00:00"),tz= "UTC")

df <- data.frame(id,start_time,end_time)

and the example dataframe looks like:

            ID             start_time           end_time
1           a        2016-05-27 09:30:00    2016-05-27 10:30:00
2           a        2016-05-27 15:30:00    2016-05-27 17:30:00
3           b        2016-05-27 14:30:00    2016-05-27 16:30:00
4           c        2016-05-27 09:40:00    2016-05-27 09:50:00
5           c        2016-05-27 15:00:00    2016-05-27 16:00:00

The desired result from the suggested dataframe is

            ID_1             ID_2        overlap
1           a                 b         0 + 60 mins
2           a                 c        10 + 0 + 0 + 30 mins
3           b                 c         0 + 60 mins

The last column does not have to show all the cases. It is just to help your understand. Would there be anyway to compute total overlapped time between groups by comparing all the time intervals?


Solution

  • Here goes:

    
    
    library(magrittr)
    library(lubridate)
    library(tidyr)
    
    df %<>% mutate( interval = interval( start_time, end_time ) )
    
    df %>% full_join( df, by=character(), suffix=c("_1","_2") ) %>%
        mutate( overlap = lubridate::intersect( interval_1, interval_2 ) ) %>%
        filter( id_1 < id_2 ) %>%
        replace_na( list(overlap=0) ) %>%
        group_by( id_1, id_2 ) %>%
        summarise( overlap = paste(paste( as.numeric( overlap ) / 60, collapse=" + " ),"mins"))
    
    

    various lubridate functions are key to the solution, the rest is just infrastructure

    Output:

    
      id_1  id_2  overlap              
      <chr> <chr> <chr>                
    1 a     b     0 + 60 mins          
    2 a     c     10 + 0 + 0 + 30 mins
    3 b     c     0 + 60 mins