Search code examples
rdplyrcounttime-series

Count events in R per row depending on data range


I have a dataset of 1000 patients (Patient 1-1000) undergoing the same type of procedure but by diffrent surgeons. I am interested to count the number of operations performed by each surgeon from study start date 27/06/2023 (for example) before the next procedure and insert this count/number per each patient/row. For example I need to know how many operations surgeon A performed on previous patients (1 and 2) before operating on patient 3 (e.g. 2 operation).Same for surgeon B etc.

I gues there is some formula in dplyr but I cannot get my head around it.

Image of dataset and what I need to get

Patient Surgeon Operation Date  Event before index (operation date)
1   A   28/06/2023  0
2   A   29/06/2023  1
3   A   30/06/2023  2
4   B   1/07/2023   0
5   C   2/07/2023   1
6   C   3/07/2023   2

Solution

  • Please try the below code which can be give the Event before index (operation date) even if the patient is repeated operated by surgeon previously please check the new variable

    however at some places it does not match i.e., highlighted with #

    library(dplyr)
    
    df_2 <- df %>% group_by(Surgeon,Patient) %>% 
      slice_tail(n=1) %>% group_by(Surgeon) %>% mutate(new=row_number()-1)
    
    df_or <- df %>% 
      left_join(df_2 %>% select(Patient, Surgeon, new), by=c('Surgeon','Patient')) 
    
    
    # output
    
    # A tibble: 6 × 5
      Patient Surgeon Operation_Date Event_before   new
        <dbl> <chr>   <date>                <dbl> <dbl>
    1       1 A       2023-06-28                0     0
    2       2 A       2023-06-29                1     1
    3       3 A       2023-06-30                2     2
    4       4 B       2023-07-01                0     0
    5       5 C       2023-07-02                1     0 #
    6       6 C       2023-07-03                2     1 #