Search code examples
rdplyrcountdate-range

How to count the number of observation that fall in quarter based on start and end date?


   patient_id dt_diag_init enroll    
      <int64> <date>       <date>    
 1      10401 2018-01-04   2020-09-30
 2      60701 2019-05-31   2019-09-30
 3     343702 2018-12-05   2020-09-30
 4     472202 2019-12-30   2020-09-30
 5     489502 2019-09-17   2019-11-30
 6     557401 2019-10-15   2020-09-30
 7     857901 2018-01-02   2020-09-30
 8     874201 2018-01-01   2020-09-30
 9    1309102 2019-03-11   2020-09-30
10    1317601 2018-08-14   2020-09-30

I am trying to tally the number of patients per quarter based on the date range (dt_diag_init and enroll). The patient_id 10401 has starts from 2018-01-04 to 2020-09-30 so that patient would be counted in Q1'18, Q2'18....to Q3'20 since the date range overlap. I am trying to get an output table with just the Quarter and the count of patients per quarter.

Sample output table

Qtr    year total   
   <chr> <dbl> <dbl>
 1 Q1'18  2018 485 
 2 Q2'18  2018 516 
 3 Q3'18  2018 560

This is what I tried to create the table with quarter column before doing the tally but I am getting errors:

df_1 <- df %>%
  mutate(quarter = map2(
    as.numeric(dt_diag_init),
    as.numeric(enroll),
    ~ format(seq(.x, .y, by="quarter"), "Q%q'%y")
  ))

Sample date:

df <- structure(list(patient_id = structure(c(5.13877678239481e-320, 
2.99902787682095e-319, 1.69811350606928e-318, 2.33298786097528e-318, 
2.41846121770582e-318, 2.75392685057557e-318, 4.23859411632851e-318, 
4.31912681660064e-318, 6.46782325102068e-318, 6.50981389026072e-318
), class = "integer64"), dt_diag_init = structure(c(17535, 18047, 
17870, 18260, 18156, 18184, 17533, 17532, 17966, 17757), class = "Date"), 
    enroll = structure(c(18535, 18169, 18535, 18535, 18230, 18535, 
    18535, 18535, 18535, 18535), class = "Date")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • Perhaps this helps

    library(dplyr)
    library(zoo)
    library(tidyr)
    library(purrr)
    library(lubridate)
    df %>%  
       mutate(quarter = map2(dt_diag_init, enroll, 
        ~ as.yearqtr(seq(.x, .y, by = 'quarter')))) %>%
       unnest(quarter) %>%
       mutate(year = year(quarter)) %>% 
       count(quarter, year, name = 'total')
    

    -output

    # A tibble: 11 × 3
       quarter    year total
       <yearqtr> <dbl> <int>
     1 2018 Q1    2018     3
     2 2018 Q2    2018     3
     3 2018 Q3    2018     4
     4 2018 Q4    2018     5
     5 2019 Q1    2019     6
     6 2019 Q2    2019     7
     7 2019 Q3    2019     8
     8 2019 Q4    2019     8
     9 2020 Q1    2020     8
    10 2020 Q2    2020     8
    11 2020 Q3    2020     8