Search code examples
rlubridateset-difference

R lubridate find non overlapping periods between a continuous time frame and a set of intervals


I've got the following time frame:

A <- c('2016-01-01', '2019-01-05')
B <- c('2017-05-05','2019-06-05')

X_Period <- interval("2015-01-01", "2019-12-31")
Y_Periods <- interval(A, B)

I'd like to find the non overlapping periods between X_Period and Y_Periods so that the result would be:

[1]'2015-01-01'--'2015-12-31'
[2]'2017-05-06'--'2019-01-04'
[3]'2019-06-06'--'2019-31-12'

I'm trying to use setdiff but it does not work

setdiff(X_Period, Y_Periods)

Solution

  • Here is an option:

    library(lubridate)
    seq_X <- as.Date(seq(int_start(X_Period), int_end(X_Period), by = "1 day"))
    seq_Y <- as.Date(do.call("c", sapply(Y_Periods, function(x)
        seq(int_start(x), int_end(x), by = "1 day"))))
    
    unique_dates_X <- seq_X[!seq_X %in% seq_Y]
    
    lst <- aggregate(
        unique_dates_X,
        by = list(cumsum(c(0, diff.Date(unique_dates_X) != 1))),
        FUN = function(x) c(min(x), max(x)),
        simplify = F)$x    
    
    lapply(lst, function(x) interval(x[1], x[2]))
    #[[1]]
    #[1] 2015-01-01 UTC--2015-12-31 UTC
    #
    #[[2]]
    #[1] 2017-05-06 UTC--2019-01-04 UTC
    #
    #[[3]]
    #[1] 2019-06-06 UTC--2019-12-31 UTC
    

    The strategy is to convert the intervals to by-day sequences (one for X_Period and one for Y_Period); then we find all days that are only part of X_Period (and not part of Y_Periods). We then aggregate to determine the first and last date in all sub-sequences of consecutive dates. The resulting lst is a list with those start/end dates. To convert to interval, we simply loop through the list and convert the start/end dates to an interval.