Search code examples
rdatecountintervalsdate-range

count row if date falls within date range for all dates in series in R


I have a large data frame (~30,000 rows) where I have two date fields "start_date" and "end_date".

I want to summarise the data such that I have 1 column with all the dates and a second column with a count of all the rows in which that date is between the "start_date" and "end_date".

I can make this work using 2 for loops but it is very inefficient as it is going by one though one comparing about 180 dates to 30,000 rows of date ranges.

Below is an example. Say I have the following dataframe.

df <- tibble(
  start_date = c(1,1,2,2,3,3,4,4,5,5),
  end_date = c(2,3,4,5,6,7,8,9,10,11)
)

I want this to output a table/dataframe that looks like this

Date    Count
1       2
2       4
3       5
4       6
5       7
6       6
7       5
8       4
9       3
10      2
11      1

Is there some TidyVerse functions or anything else that could do this transformation efficiently?


Solution

  • Here's a base R method:

    date = seq(min(df$start_date), max(df$end_date))
    count = sapply(date, \(x) sum(x >= df$start_date & x <= df$end_date))
    data.frame(date, count)
    #    date count
    # 1     1     2
    # 2     2     4
    # 3     3     5
    # 4     4     6
    # 5     5     7
    # 6     6     6
    # 7     7     5
    # 8     8     4
    # 9     9     3
    # 10   10     2
    # 11   11     1