Search code examples
rdata.tableposixoverlap

Select rows by date within interval in data.table


I would like to select the observations in a data table that fall within intervals of time specified in a second data table - the intervals are periods of time when observations are made from 2 platforms concurrently.

The first data table looks something like this. It is a bunch of animal sightings.

obs = data.table(sighting = as.POSIXct(c("2018-08-12 16:30:00", "2018-08-12 16:35:00", "2018-08-12 16:38:00", "2107-08-13 15:13:00", "2107-08-13 16:13:00", "2017-08-14 11:12:13"), format = "%Y-%m-%d %H:%M:%OS", tz = "America/Halifax"), encounter = c("1", "1", "1", "2", "3", "4"), what = c("frog", "frog", "toad", "bird", "goat","bird"))

Observations were made from 2 platforms.

platformA = data.table(station = "A", on.effort = as.POSIXct(c("2018-08-12 16:00:00", "2018-08-12 17:35:00","2017-08-14 11:00:13", "2018-08-15 17:35:00"), format = "%Y-%m-%d %H:%M:%OS", tz = "America/Halifax"), off.effort = as.POSIXct(c("2018-08-12 16:36:00", "2018-08-12 18:35:00","2017-08-14 12:12:13", "2018-08-15 18:35:00"), format = "%Y-%m-%d %H:%M:%OS", tz = "America/Halifax"))

platformB = data.table(station = "B", on.effort = as.POSIXct(c("2018-08-12 16:15:00", "2018-08-12 17:40:00", "2018-08-13 17:40:00","2017-08-14 11:05:13"), format = "%Y-%m-%d %H:%M:%OS", tz = "America/Halifax"), off.effort = as.POSIXct(c("2018-08-12 16:40:00", "2018-08-13 17:45:00", "2018-08-12 18:20:00","2017-08-14 12:30:13"), format = "%Y-%m-%d %H:%M:%OS", tz = "America/Halifax"))

I first calculated intervals for each platform, and then intersected the intervals to find out when observations were made concurrently.

setkey(platformA, on.effort, off.effort)
setkey(platformB, on.effort, off.effort)

common = foverlaps(platformA, platformB,type="any",nomatch=0)

common$x = intersect(interval(common$on.effort, common$off.effort), 
                     interval(common$i.on.effort, common$i.off.effort))

I'd like to end up with a table that is a subset of "obs" and contains only the rows covered by the intervals in "common$x". I had hoped to use foverlaps to find the rows in the intersected intervals, and created "point" intervals for my sightings with

obs[, sighting2 := sighting]

But foverlaps wants the "start" and "end" of each interval in separate columns, which is not how the intervals are stored in common$x.

I would like my output to look like this

           sighting encounter what
2018-08-12 16:30:00         1 frog
2018-08-12 16:35:00         1 frog
2017-08-14 11:12:13         4 bird

I'd appreciate any tips. Perhaps I could have been more efficient earlier on? Thanks.


Solution

  • I think this should work even if you have varying observation numbers between the platforms. Using your obs, platformA, and platformB data as above, make intervals for both platforms more or less as you did above in common:

    common = intersect(interval(platformA$on.effort, platformA$off.effort), 
                       interval(platformB$on.effort, platformB$off.effort))
    

    You should be able to use %within% to check if there are any cases where the sighting falls within the common interval:

    obs$both.seen <- sapply(obs$sighting, function(s){
      any(s %within% common)
    })
    

    OR

    obs[, both.seen := sapply(sighting, function(x) any(x %within% common))]
    

    New obs:

    > obs
                  sighting encounter what both.seen
    1: 2018-08-12 16:30:00         1 frog      TRUE
    2: 2018-08-12 16:35:00         1 frog      TRUE
    3: 2018-08-12 16:38:00         1 toad     FALSE
    4: 2107-08-13 15:13:00         2 bird     FALSE
    5: 2107-08-13 16:13:00         3 goat     FALSE
    6: 2017-08-14 11:12:13         4 bird      TRUE
    

    Subset to get to your desired output:

    obs <- obs[both.seen == 1][, both.seen := NULL][]
    
    > obs
                  sighting encounter what
    1: 2018-08-12 16:30:00         1 frog
    2: 2018-08-12 16:35:00         1 frog
    3: 2017-08-14 11:12:13         4 bird