Computing number of participants by timepoints

I have a short question about computing the number of participants by timepoints. Consider the sample long format data:

data<-data.frame(id=c(1,1,1,1,2,2,2,3,3,3,3),survey_date=c("01/12/2020","02/12/2020","03/12/2020","04/12/2020","01/31/2020","03/12/2020","04/05/2020","02/12/2020","04/12/2020","05/12/2020","06/12/2020"),last_seen=c('05/25/2020','05/25/2020','05/25/2020','05/25/2020','04/06/2020','04/06/2020','04/06/2020','','','',''))

Survey date indicates when the survey took place. \Some subjects in the data became lost to follow-up, we do have that last-seen date recorded in the data for those subjects (it appears in all rows for that subject).

I would like to add a column 'num_N' to the existing data indicating the number of participants who were in the study at the timepoint survey_date.

For example, on 01/12/2020, only subject id=1 was in the study, so num_N for that row would be 1.
On 06/12/2020, only subject id=3 was in the study, so num_N for that row would be 1.

Any help would be appreciated. Thanks!

Solution

This is a good case for iv_count_between from ivs:

library(dplyr)
library(ivs)

#Convert your dates to date format
data <- data |> 
  mutate(across(-id, lubridate::mdy)) 

#Create intervals
data_ivs <- 
  data |> 
  summarise(min = min(survey_date), 
            max = max(survey_date, last_seen, na.rm = TRUE), 
            .by = id) |> 
  mutate(ivs = iv(min, max + 1))

#Count intervals with `iv_count_between`:
data |> 
  mutate(num_N = iv_count_between(survey_date, data_ivs$ivs))

#    id survey_date  last_seen num_N
# 1   1  2020-01-12 2020-05-25     1
# 2   1  2020-02-12 2020-05-25     3
# 3   1  2020-03-12 2020-05-25     3
# 4   1  2020-04-12 2020-05-25     2
# 5   2  2020-01-31 2020-04-06     2
# 6   2  2020-03-12 2020-04-06     3
# 7   2  2020-04-05 2020-04-06     3
# 8   3  2020-02-12       <NA>     3
# 9   3  2020-04-12       <NA>     2
# 10  3  2020-05-12       <NA>     2
# 11  3  2020-06-12       <NA>     1