Search code examples
rdesign-patternscountconditional-statementssequence

Sequential Count of Treatment Episodes in R Dataframe


I have a data set of different patient ID's, clinic visit dates, and attendance (see example data below, separated by patient ID for clarity).

I am interested in sequentially counting treatment episodes, which are defined as attending >= 4 visits for their starting month, followed by >= 1 visit every month afterwards. If a patient attends <1 visit after starting (i.e., after completing their initial >=4 visits in the starting month), that treatment episode is considered ended. A new treatment episode subsequently starts the next time a patient attends >= 4 visits in a given month, and that same episode continues as long as the patient attends >=1 visit/month thereafter. When patients either do not meet or break this pattern, I'd like to input 0.

Example data (note: I've excluded each day's date to prevent the example from being excessively long and re-produced dates to give a clearer picture of the desired data):

Patient ID Visit Date Attendance
1 01/01/2023 Yes
1 01/02/2023 Yes
1 01/03/2023 Yes
1 01/04/2023 Yes
1 02/01/2023 Yes
1 03/01/2023 Yes
1 04/01/2023 No
1 05/01/2023 Yes
1 06/01/2023 No
1 07/01/2023 Yes
1 07/02/2023 Yes
1 07/03/2023 Yes
1 07/04/2023 Yes
1 08/01/2023 Yes
---------- ---------- ----------
Patient ID Visit Date Attendance
---------- ---------- ----------
2 01/01/2023 Yes
2 02/01/2023 Yes
2 03/01/2023 Yes
2 03/02/2023 Yes
2 03/03/2023 Yes
2 03/04/2023 Yes
2 04/01/2023 Yes
2 05/01/2023 Yes
2 07/01/2023 Yes

Desired data:

Patient ID Visit Date Attendance Tx Episode
1 01/01/2023 Yes 1
1 01/02/2023 Yes 1
1 01/03/2023 Yes 1
1 01/04/2023 Yes 1
1 02/01/2023 Yes 1
1 03/01/2023 Yes 1
1 04/01/2023 No 0
1 05/01/2023 Yes 0
1 06/01/2023 No 0
1 07/01/2023 Yes 2
1 07/02/2023 Yes 2
1 07/03/2023 Yes 2
1 07/04/2023 Yes 2
1 08/01/2023 Yes 2
---------- ---------- ---------- ----------
Patient ID Visit Date Attendance Tx Episode
---------- ---------- ---------- ----------
2 01/01/2023 Yes 0
2 02/01/2023 Yes 0
2 03/01/2023 Yes 1
2 03/02/2023 Yes 1
2 03/03/2023 Yes 1
2 03/04/2023 Yes 1
2 04/01/2023 Yes 1
2 05/01/2023 Yes 1
2 07/01/2023 Yes 0

I am somewhat new to programming in R and have initially attempted to use ifelse() but wasn't able to come up with logicals that worked. I've also attempted to write loops, which have failed to run.

Any help would be greatly appreciated and I'm happy to provide more detail if the above isn't clear.

Thanks in advance for your time/effort!


Solution

  • This seems fairly complex, and not sure of entire logic, but thought this may help. This uses the lubridate library, but otherwise base R functions. A helper function elapsed_months was borrowed from here.

    First an empty list is created enc_list that will store results for the final data.frame.

    We construct two loops - the first to analyze data for each Patient_ID, and the second to evaluate encounters for that given patient.

    Note that I subset based on Attendance being "Yes" - if not attended, would not want to include that data in evaluation. This is an assumption on my part.

    A table of months for the Visit_Date is made so that we know which months have >= 4 encounters.

    The enc_active is a simple flag on whether row-by-row we are dealing with an active encounter. The enc_num is the number treatment encounter that is incremented when new treatment encounters are discovered.

    Going row-by-row through encounter data, first check if in an active encounter. If it is, check if the number of elapsed months is 0 (same month) or 1 (consecutive month). If true, then record that encounter. If not true, then the treatment encounter is over.

    If not an active encounter, check if has a month with 4+ encounters, and if it does, set to a new active treatment encounter. Note that in cases were not true, it will record 0 for Tx_Encounter and then reset the flag.

    The final results are stored back in the list which will be bound together with rbind (row bind) in the end.

    The merge will combine the results with the original data.frame, which will be needed since the rows with Attendance or "No" were removed early on. Since the merge will leave Tx_Encounter with missing for those "No"s, we'll replace NA with 0.

    Some example data was adapted from your comment. Please let me know of questions - happy to do a StackOverflow chat to go over as well. I do have an interest in this form of data from my own experiences.

    library(lubridate)
    
    elapsed_months <- function(end_date, start_date) {
      ed <- as.POSIXlt(end_date)
      sd <- as.POSIXlt(start_date)
      12 * (ed$year - sd$year) + (ed$mon - sd$mon)
    }
    
    enc_list <- list()
    
    for (id in unique(df$Patient_ID)) {
      enc_data <- df[df$Patient_ID == id & df$Attendance == "Yes", ]
      enc_month <- table(cut(enc_data$Visit_Date, 'month'))
      enc_active <- F
      enc_num <- 0
      for (i in 1:nrow(enc_data)) {
        if (enc_active) {
          if(elapsed_months(enc_data$Visit_Date[i], enc_data$Visit_Date[i - 1]) <= 1) {
            enc_data[i, "Tx_Episode"] <- enc_num
          } else {
            enc_active = F
            enc_data[i, "Tx_Episode"] <- 0
          }
        } else {
          if(enc_month[as.character(floor_date(enc_data$Visit_Date[i], unit = "month"))] >= 4) {
            enc_active = T
            enc_num <- enc_num + 1
            enc_data[i, "Tx_Episode"] <- enc_num
          } else {
            enc_data[i, "Tx_Episode"] <- 0
          }
        }
      }
      enc_list[[id]] <- enc_data
    }
    
    df_final <- merge(
      do.call('rbind', enc_list),
      df,
      all.y = T
    )
    
    df_final$Tx_Episode[is.na(df_final$Tx_Episode)] <- 0
    

    Output

       Patient_ID Visit_Date Attendance Tx_Episode
    1           1 2023-01-01        Yes          1
    2           1 2023-01-02        Yes          1
    3           1 2023-01-03        Yes          1
    4           1 2023-01-04        Yes          1
    5           1 2023-02-01        Yes          1
    6           1 2023-03-01        Yes          1
    7           1 2023-04-01         No          0
    8           1 2023-05-01        Yes          0
    9           1 2023-06-01         No          0
    10          1 2023-07-01        Yes          2
    11          1 2023-07-02        Yes          2
    12          1 2023-07-03        Yes          2
    13          1 2023-07-04        Yes          2
    14          1 2023-08-01        Yes          2
    15          2 2023-01-01        Yes          0
    16          2 2023-02-01        Yes          0
    17          2 2023-03-01        Yes          1
    18          2 2023-03-02        Yes          1
    19          2 2023-03-03        Yes          1
    20          2 2023-03-04        Yes          1
    21          2 2023-04-01        Yes          1
    22          2 2023-04-02        Yes          1
    23          2 2023-04-03        Yes          1
    24          2 2023-04-04        Yes          1
    25          2 2023-06-12        Yes          0