I want to count the number of events that occur within intervals.
I start with a table that has three columns: start dates, end dates, and the interval created by them.
table <-
tibble(
start = c( "2022-08-02", "2022-10-06", "2023-01-11"),
end = c("2022-08-04", "2023-02-06", "2023-02-04"),
interval = start %--% end
)
I also have a vector of timestamp events:
events <- c(ymd("2022-08-07"), ymd("2022-10-17"), ymd("2023-01-17"), ymd("2023-02-02"))
For each interval in my table, I want to know how many events fell within that interval so that my final table looks something like this (but with the correct counts):
start end interval n_events_within_interval
<chr> <chr> <Interval> <int>
1 2022-08-02 2022-08-04 2022-08-02 UTC--2022-08-04 UTC 2
2 2022-10-06 2023-02-06 2022-10-06 UTC--2023-02-06 UTC 2
3 2023-01-11 2023-02-04 2023-01-11 UTC--2023-02-04 UTC 2
I have tried this so far, but I'm not sure how to get mutate to cycle through the events
vector for each row:
library(tidyverse)
library(lubridate)
library(purrr)
table <-
tibble(
start = c( "2022-08-02", "2022-10-06", "2023-01-11"),
end = c("2022-08-04", "2023-02-06", "2023-02-04"),
interval = start %--% end
)
table
#> # A tibble: 3 × 3
#> start end interval
#> <chr> <chr> <Interval>
#> 1 2022-08-02 2022-08-04 2022-08-02 UTC--2022-08-04 UTC
#> 2 2022-10-06 2023-02-06 2022-10-06 UTC--2023-02-06 UTC
#> 3 2023-01-11 2023-02-04 2023-01-11 UTC--2023-02-04 UTC
events <- c(ymd("2022-08-07"), ymd("2022-10-17"), ymd("2023-01-17"), ymd("2023-02-02"))
events
#> [1] "2022-08-07" "2022-10-17" "2023-01-17" "2023-02-02"
table %>%
mutate(
n_events_within_interval = sum(events %within% interval)
)
#> Warning in as.numeric(a) - as.numeric(int@start): longer object length is not a
#> multiple of shorter object length
#> Warning in as.numeric(a) - as.numeric(int@start) <= [email protected]: longer object
#> length is not a multiple of shorter object length
#> Warning in as.numeric(a) - as.numeric(int@start): longer object length is not a
#> multiple of shorter object length
#> # A tibble: 3 × 4
#> start end interval n_events_within_interval
#> <chr> <chr> <Interval> <int>
#> 1 2022-08-02 2022-08-04 2022-08-02 UTC--2022-08-04 UTC 2
#> 2 2022-10-06 2023-02-06 2022-10-06 UTC--2023-02-06 UTC 2
#> 3 2023-01-11 2023-02-04 2023-01-11 UTC--2023-02-04 UTC 2
Created on 2023-02-15 with reprex v2.0.2
We could use rowwise
library(dplyr)
library(lubridate)
table %>%
rowwise %>%
mutate(n_events_within_interval = sum(events %within% interval)) %>%
ungroup