Search code examples
rdatetime-seriesgroupingcorrelation

Temporally correlating events with date ranges


I have a dataframe of extreme temperature events in R. The data looks something like this:

data <- data.frame(date_start = c("1980-05-11", "1980-07-12", "1980-08-17", "1980-05-10", "1980-05-23"), date_end = c("1980-06-27", "1980-07-29", "1980-09-03", "1980-05-19", "1980-06-27"), lat = (c(31, 31, 31, 32, 32)), lon = c(-119, -119, -119, -120, -120)) # Create dummy data
data$date_start <- as.Date(data$date_start) ; data$date_end <- as.Date(data$date_end)

data
#  date_start   date_end lat  lon
# 1 1980-05-11 1980-06-27  31 -119
# 2 1980-07-12 1980-07-29  31 -119
# 3 1980-08-17 1980-09-03  31 -119
# 4 1980-05-10 1980-05-19  32 -120
# 5 1980-05-23 1980-06-27  32 -120

I want to correlate events with overlapping time windows (i.e. the events in the 1st and 4th rows). The goal is to understand the spatial extent of each event (how many lat/lon values does it extend over).

Given date_start and date_end values, what is the best way to group my dataset into events which occurred concurrently?

I have tried looping through every day of interest and extracting the events for that day, but then I have to correlate my events across dates. This method is also very inefficient.

I think there could be some useful content in this post, but this doesn't quite get me where I need to be since my events have a date range rather than a single date per event.

Any thoughts would be deeply appreciated!!


Solution

  • Another possible solution would be to use the ivs package, which was specifically created for working with interval data like this.

    library(ivs)
    
    data <- data.frame(date_start = c("1980-05-11", "1980-07-12", "1980-08-17", "1980-05-10", "1980-05-23"), date_end = c("1980-06-27", "1980-07-29", "1980-09-03", "1980-05-19", "1980-06-27"), lat = (c(31, 31, 31, 32, 32)), lon = c(-119, -119, -119, -120, -120)) # Create dummy data
    data$date_start <- as.Date(data$date_start)
    data$date_end <- as.Date(data$date_end)
    
    # Merge [start, end) into an interval object
    data$range <- iv(data$date_start, data$date_end)
    data$date_start <- NULL
    data$date_end <- NULL
    
    # Identify the interval "group" that each row falls in
    data$group <- iv_identify_group(data$range)
    
    # Optionally, turn that into an integer id for grouping
    # (but you could also group on `group`)
    data$id <- vctrs::vec_group_id(data$group)
    
    data
    #>   lat  lon                    range                    group id
    #> 1  31 -119 [1980-05-11, 1980-06-27) [1980-05-10, 1980-06-27)  1
    #> 2  31 -119 [1980-07-12, 1980-07-29) [1980-07-12, 1980-07-29)  2
    #> 3  31 -119 [1980-08-17, 1980-09-03) [1980-08-17, 1980-09-03)  3
    #> 4  32 -120 [1980-05-10, 1980-05-19) [1980-05-10, 1980-06-27)  1
    #> 5  32 -120 [1980-05-23, 1980-06-27) [1980-05-10, 1980-06-27)  1