Search code examples
rlubridate

Coercing a (previously) coerced interval into a character string back into an interval


Lubridate's interval function together with its parsing functions allow us to obtain an interval from strings.

# An illustrative example.
my_interval = interval(dmy("15/07/2019"), dmy("15/07/2020"))

When printed, my_interval looks like this: 2019-07-15 UTC--2020-07-15 UTC

Now, it is perfectly plausible to coerce an interval into a character string through the as.character function. However, what about the inverse? Can an interval coerced into a character string be coerced back into an interval?

Perhaps one may ask why would I want to coerce an interval into a character string in the first place. I actually have several intervals and I would like to obtain their combinations through the combn function.

# Suppose x is a vector of intervals. Then, what I would like to execute is
combn(x, 2) %>% t() %>% as.data.frame() %>% mutate(overlap = int_overlaps(V1, V2))

However, when combn is applied onto a vector of intervals it —for some reason— returns a matrix of numbers. Therefore, I see fit to first coerce my vector of intervals into a vector of characters and then, after having obtained the combinations of the strings, turn the strings back into intervals.


Solution

  • 1) character to interval If ch is the character representation of my_interval then my_interval2 is the reconstruction of it from ch. We test this with a vector of intervals, my_intervals. This is also used in the other alternatives.

    # test data
    my_intervals <- rep(my_intervals, 3)
    
    ch <- format(my_intervals)  # rep)"2019-07-15 UTC--2020-07-15 UTC", 3)
    
    my_intervals2 <- interval(as.Date(ch), as.Date(sub(".*--", "", ch)))
    
    identical(my_intervals, my_intervals2)
    ## [1] TRUE
    

    2) complex Instead of converting to character and back convert to complex and back. We use the same vector of intervals as above for testing.

    library(zoo)
    
    int2cplx <- function(x) c(cbind(as.Date(int_start(x)), as.Date(int_end(x))) %*% c(1,1i))
    cplx2int <- function(x) interval(as.Date(Re(x)), as.Date(Im(x)))
    
    my_intervals2 <- int2cplx(my_intervals)
    identical(my_intervals, cplx2int(my_intervals2))
    ## [1] TRUE
    

    3) combn with indexes If the only reason to convert back and forth between character and interval is to use combn then instead use combn over indexes.

    library(dplyr)
    library(lubridate)
    
    # given a 2-vector of indexes, e.g. 1:2, and vector of intervals this returns
    #   1 row tibble with cols int1, int2, overlaps of classes interval, interval, logical
    ovrlap <- function(index, intervals) {
      ints <- intervals[index]
      tibble(int1 = ints[[1]], int2 = ints[[2]], overlaps = int_overlaps(int1, int2))
    } 
    
    my_intervals %>%
      combn(length(.), 2, ovrlap, intervals = ., simplify = FALSE) %>%
      bind_rows
    

    giving:

    # A tibble: 3 x 3
      int1                           int2                           overlaps
      <Interval>                     <Interval>                     <lgl>   
    1 2019-07-15 UTC--2020-07-15 UTC 2019-07-15 UTC--2020-07-15 UTC TRUE    
    2 2019-07-15 UTC--2020-07-15 UTC 2019-07-15 UTC--2020-07-15 UTC TRUE    
    3 2019-07-15 UTC--2020-07-15 UTC 2019-07-15 UTC--2020-07-15 UTC TRUE    
    

    4) list comprehension Instead of using combn we could use the listcompr package to generate the result using list comprehensions

    library(lubridate)
    library(listcompr)
    
    overlap <- function(x) {
      n <- length(x)
      gen.data.frame(data.frame(int1 = x[i], int2 = x[j], 
        overlaps = int_overlaps(x[i], x[j])), i < j, i = 1:n, j = 1:n)
    }
    
    overlap(my_intervals)
    

    giving

                                 int1                           int2 overlaps
     1 2019-07-15 UTC--2020-07-15 UTC 2019-07-15 UTC--2020-07-15 UTC     TRUE
     2 2019-07-15 UTC--2020-07-15 UTC 2019-07-15 UTC--2020-07-15 UTC     TRUE
     3 2019-07-15 UTC--2020-07-15 UTC 2019-07-15 UTC--2020-07-15 UTC     TRUE