Search code examples
rfrequencyproportions

Computing number of participants by timepoints


I have a short question about computing the number of participants by timepoints. Consider the sample long format data:

data<-data.frame(id=c(1,1,1,1,2,2,2,3,3,3,3),survey_date=c("01/12/2020","02/12/2020","03/12/2020","04/12/2020","01/31/2020","03/12/2020","04/05/2020","02/12/2020","04/12/2020","05/12/2020","06/12/2020"),last_seen=c('05/25/2020','05/25/2020','05/25/2020','05/25/2020','04/06/2020','04/06/2020','04/06/2020','','','',''))

Survey date indicates when the survey took place. \Some subjects in the data became lost to follow-up, we do have that last-seen date recorded in the data for those subjects (it appears in all rows for that subject).

I would like to add a column 'num_N' to the existing data indicating the number of participants who were in the study at the timepoint survey_date.

For example, on 01/12/2020, only subject id=1 was in the study, so num_N for that row would be 1.
On 06/12/2020, only subject id=3 was in the study, so num_N for that row would be 1.

Any help would be appreciated. Thanks!


Solution

  • This is a good case for iv_count_between from ivs:

    library(dplyr)
    library(ivs)
    
    #Convert your dates to date format
    data <- data |> 
      mutate(across(-id, lubridate::mdy)) 
    
    #Create intervals
    data_ivs <- 
      data |> 
      summarise(min = min(survey_date), 
                max = max(survey_date, last_seen, na.rm = TRUE), 
                .by = id) |> 
      mutate(ivs = iv(min, max + 1))
    
    #Count intervals with `iv_count_between`:
    data |> 
      mutate(num_N = iv_count_between(survey_date, data_ivs$ivs))
    
    #    id survey_date  last_seen num_N
    # 1   1  2020-01-12 2020-05-25     1
    # 2   1  2020-02-12 2020-05-25     3
    # 3   1  2020-03-12 2020-05-25     3
    # 4   1  2020-04-12 2020-05-25     2
    # 5   2  2020-01-31 2020-04-06     2
    # 6   2  2020-03-12 2020-04-06     3
    # 7   2  2020-04-05 2020-04-06     3
    # 8   3  2020-02-12       <NA>     3
    # 9   3  2020-04-12       <NA>     2
    # 10  3  2020-05-12       <NA>     2
    # 11  3  2020-06-12       <NA>     1