Search code examples
rpurrrintervalslubridatemutate

Count the number of timestamps in a given vector that fall within an interval in R


I want to count the number of events that occur within intervals.

I start with a table that has three columns: start dates, end dates, and the interval created by them.

table <-
  tibble( 
    start = c( "2022-08-02", "2022-10-06", "2023-01-11"), 
    end = c("2022-08-04", "2023-02-06", "2023-02-04"), 
    interval = start %--% end
  )

I also have a vector of timestamp events:

events <- c(ymd("2022-08-07"), ymd("2022-10-17"), ymd("2023-01-17"), ymd("2023-02-02"))

For each interval in my table, I want to know how many events fell within that interval so that my final table looks something like this (but with the correct counts):

   start       end         interval                       n_events_within_interval
   <chr>       <chr>       <Interval>                                        <int>
 1 2022-08-02  2022-08-04  2022-08-02 UTC--2022-08-04 UTC                        2
 2 2022-10-06  2023-02-06  2022-10-06 UTC--2023-02-06 UTC                        2
 3 2023-01-11  2023-02-04  2023-01-11 UTC--2023-02-04 UTC                        2

I have tried this so far, but I'm not sure how to get mutate to cycle through the events vector for each row:

library(tidyverse)
library(lubridate)
library(purrr)

table <-
  tibble( 
    start = c( "2022-08-02", "2022-10-06", "2023-01-11"), 
    end = c("2022-08-04", "2023-02-06", "2023-02-04"), 
    interval = start %--% end
  )

table

#> # A tibble: 3 × 3
#>   start      end        interval                      
#>   <chr>      <chr>      <Interval>                    
#> 1 2022-08-02 2022-08-04 2022-08-02 UTC--2022-08-04 UTC
#> 2 2022-10-06 2023-02-06 2022-10-06 UTC--2023-02-06 UTC
#> 3 2023-01-11 2023-02-04 2023-01-11 UTC--2023-02-04 UTC


events <- c(ymd("2022-08-07"), ymd("2022-10-17"), ymd("2023-01-17"), ymd("2023-02-02"))
events
#> [1] "2022-08-07" "2022-10-17" "2023-01-17" "2023-02-02"


table %>% 
  mutate(
    n_events_within_interval = sum(events %within% interval)
  )
#> Warning in as.numeric(a) - as.numeric(int@start): longer object length is not a
#> multiple of shorter object length
#> Warning in as.numeric(a) - as.numeric(int@start) <= [email protected]: longer object
#> length is not a multiple of shorter object length
#> Warning in as.numeric(a) - as.numeric(int@start): longer object length is not a
#> multiple of shorter object length
#> # A tibble: 3 × 4
#>   start      end        interval                       n_events_within_interval
#>   <chr>      <chr>      <Interval>                                        <int>
#> 1 2022-08-02 2022-08-04 2022-08-02 UTC--2022-08-04 UTC                        2
#> 2 2022-10-06 2023-02-06 2022-10-06 UTC--2023-02-06 UTC                        2
#> 3 2023-01-11 2023-02-04 2023-01-11 UTC--2023-02-04 UTC                        2

Created on 2023-02-15 with reprex v2.0.2


Solution

  • We could use rowwise

    library(dplyr)
    library(lubridate)
    table %>% 
      rowwise %>% 
      mutate(n_events_within_interval = sum(events %within% interval)) %>%
      ungroup