I have a dataframe with group id, start time, and end time. I would like to compute overlapped time interval between groups. Here is an example of the dataset;
id <- c("a","a","b","c","c")
start_time <-as.POSIXct(c("2016-05-27 09:30:00","2016-05-27 15:30:00",
"2016-05-27 14:30:00","2016-05-27 09:40:00","2016-05-27 15:00:00"),tz= "UTC")
end_time <-as.POSIXct(c("2016-05-27 10:30:00","2016-05-27 17:30:00",
"2016-05-27 16:30:00","2016-05-27 09:50:00","2016-05-27 16:00:00"),tz= "UTC")
df <- data.frame(id,start_time,end_time)
and the example dataframe looks like:
ID start_time end_time
1 a 2016-05-27 09:30:00 2016-05-27 10:30:00
2 a 2016-05-27 15:30:00 2016-05-27 17:30:00
3 b 2016-05-27 14:30:00 2016-05-27 16:30:00
4 c 2016-05-27 09:40:00 2016-05-27 09:50:00
5 c 2016-05-27 15:00:00 2016-05-27 16:00:00
The desired result from the suggested dataframe is
ID_1 ID_2 overlap
1 a b 0 + 60 mins
2 a c 10 + 0 + 0 + 30 mins
3 b c 0 + 60 mins
The last column does not have to show all the cases. It is just to help your understand. Would there be anyway to compute total overlapped time between groups by comparing all the time intervals?
Here goes:
library(magrittr)
library(lubridate)
library(tidyr)
df %<>% mutate( interval = interval( start_time, end_time ) )
df %>% full_join( df, by=character(), suffix=c("_1","_2") ) %>%
mutate( overlap = lubridate::intersect( interval_1, interval_2 ) ) %>%
filter( id_1 < id_2 ) %>%
replace_na( list(overlap=0) ) %>%
group_by( id_1, id_2 ) %>%
summarise( overlap = paste(paste( as.numeric( overlap ) / 60, collapse=" + " ),"mins"))
various lubridate functions are key to the solution, the rest is just infrastructure
Output:
id_1 id_2 overlap
<chr> <chr> <chr>
1 a b 0 + 60 mins
2 a c 10 + 0 + 0 + 30 mins
3 b c 0 + 60 mins