I have a data set of different patient ID's, clinic visit dates, and attendance (see example data below, separated by patient ID for clarity).
I am interested in sequentially counting treatment episodes, which are defined as attending >= 4 visits for their starting month, followed by >= 1 visit every month afterwards. If a patient attends <1 visit after starting (i.e., after completing their initial >=4 visits in the starting month), that treatment episode is considered ended. A new treatment episode subsequently starts the next time a patient attends >= 4 visits in a given month, and that same episode continues as long as the patient attends >=1 visit/month thereafter. When patients either do not meet or break this pattern, I'd like to input 0.
Example data (note: I've excluded each day's date to prevent the example from being excessively long and re-produced dates to give a clearer picture of the desired data):
Patient ID | Visit Date | Attendance |
---|---|---|
1 | 01/01/2023 | Yes |
1 | 01/02/2023 | Yes |
1 | 01/03/2023 | Yes |
1 | 01/04/2023 | Yes |
1 | 02/01/2023 | Yes |
1 | 03/01/2023 | Yes |
1 | 04/01/2023 | No |
1 | 05/01/2023 | Yes |
1 | 06/01/2023 | No |
1 | 07/01/2023 | Yes |
1 | 07/02/2023 | Yes |
1 | 07/03/2023 | Yes |
1 | 07/04/2023 | Yes |
1 | 08/01/2023 | Yes |
---------- | ---------- | ---------- |
Patient ID | Visit Date | Attendance |
---------- | ---------- | ---------- |
2 | 01/01/2023 | Yes |
2 | 02/01/2023 | Yes |
2 | 03/01/2023 | Yes |
2 | 03/02/2023 | Yes |
2 | 03/03/2023 | Yes |
2 | 03/04/2023 | Yes |
2 | 04/01/2023 | Yes |
2 | 05/01/2023 | Yes |
2 | 07/01/2023 | Yes |
Desired data:
Patient ID | Visit Date | Attendance | Tx Episode |
---|---|---|---|
1 | 01/01/2023 | Yes | 1 |
1 | 01/02/2023 | Yes | 1 |
1 | 01/03/2023 | Yes | 1 |
1 | 01/04/2023 | Yes | 1 |
1 | 02/01/2023 | Yes | 1 |
1 | 03/01/2023 | Yes | 1 |
1 | 04/01/2023 | No | 0 |
1 | 05/01/2023 | Yes | 0 |
1 | 06/01/2023 | No | 0 |
1 | 07/01/2023 | Yes | 2 |
1 | 07/02/2023 | Yes | 2 |
1 | 07/03/2023 | Yes | 2 |
1 | 07/04/2023 | Yes | 2 |
1 | 08/01/2023 | Yes | 2 |
---------- | ---------- | ---------- | ---------- |
Patient ID | Visit Date | Attendance | Tx Episode |
---------- | ---------- | ---------- | ---------- |
2 | 01/01/2023 | Yes | 0 |
2 | 02/01/2023 | Yes | 0 |
2 | 03/01/2023 | Yes | 1 |
2 | 03/02/2023 | Yes | 1 |
2 | 03/03/2023 | Yes | 1 |
2 | 03/04/2023 | Yes | 1 |
2 | 04/01/2023 | Yes | 1 |
2 | 05/01/2023 | Yes | 1 |
2 | 07/01/2023 | Yes | 0 |
I am somewhat new to programming in R and have initially attempted to use ifelse() but wasn't able to come up with logicals that worked. I've also attempted to write loops, which have failed to run.
Any help would be greatly appreciated and I'm happy to provide more detail if the above isn't clear.
Thanks in advance for your time/effort!
This seems fairly complex, and not sure of entire logic, but thought this may help. This uses the lubridate
library, but otherwise base R functions. A helper function elapsed_months
was borrowed from here.
First an empty list is created enc_list
that will store results for the final data.frame.
We construct two loops - the first to analyze data for each Patient_ID
, and the second to evaluate encounters for that given patient.
Note that I subset based on Attendance
being "Yes" - if not attended, would not want to include that data in evaluation. This is an assumption on my part.
A table
of months for the Visit_Date
is made so that we know which months have >= 4 encounters.
The enc_active
is a simple flag on whether row-by-row we are dealing with an active encounter. The enc_num
is the number treatment encounter that is incremented when new treatment encounters are discovered.
Going row-by-row through encounter data, first check if in an active encounter. If it is, check if the number of elapsed months is 0 (same month) or 1 (consecutive month). If true, then record that encounter. If not true, then the treatment encounter is over.
If not an active encounter, check if has a month with 4+ encounters, and if it does, set to a new active treatment encounter. Note that in cases were not true, it will record 0 for Tx_Encounter
and then reset the flag.
The final results are stored back in the list which will be bound together with rbind
(row bind) in the end.
The merge
will combine the results with the original data.frame, which will be needed since the rows with Attendance
or "No" were removed early on. Since the merge
will leave Tx_Encounter
with missing for those "No"s, we'll replace NA
with 0.
Some example data was adapted from your comment. Please let me know of questions - happy to do a StackOverflow chat to go over as well. I do have an interest in this form of data from my own experiences.
library(lubridate)
elapsed_months <- function(end_date, start_date) {
ed <- as.POSIXlt(end_date)
sd <- as.POSIXlt(start_date)
12 * (ed$year - sd$year) + (ed$mon - sd$mon)
}
enc_list <- list()
for (id in unique(df$Patient_ID)) {
enc_data <- df[df$Patient_ID == id & df$Attendance == "Yes", ]
enc_month <- table(cut(enc_data$Visit_Date, 'month'))
enc_active <- F
enc_num <- 0
for (i in 1:nrow(enc_data)) {
if (enc_active) {
if(elapsed_months(enc_data$Visit_Date[i], enc_data$Visit_Date[i - 1]) <= 1) {
enc_data[i, "Tx_Episode"] <- enc_num
} else {
enc_active = F
enc_data[i, "Tx_Episode"] <- 0
}
} else {
if(enc_month[as.character(floor_date(enc_data$Visit_Date[i], unit = "month"))] >= 4) {
enc_active = T
enc_num <- enc_num + 1
enc_data[i, "Tx_Episode"] <- enc_num
} else {
enc_data[i, "Tx_Episode"] <- 0
}
}
}
enc_list[[id]] <- enc_data
}
df_final <- merge(
do.call('rbind', enc_list),
df,
all.y = T
)
df_final$Tx_Episode[is.na(df_final$Tx_Episode)] <- 0
Output
Patient_ID Visit_Date Attendance Tx_Episode
1 1 2023-01-01 Yes 1
2 1 2023-01-02 Yes 1
3 1 2023-01-03 Yes 1
4 1 2023-01-04 Yes 1
5 1 2023-02-01 Yes 1
6 1 2023-03-01 Yes 1
7 1 2023-04-01 No 0
8 1 2023-05-01 Yes 0
9 1 2023-06-01 No 0
10 1 2023-07-01 Yes 2
11 1 2023-07-02 Yes 2
12 1 2023-07-03 Yes 2
13 1 2023-07-04 Yes 2
14 1 2023-08-01 Yes 2
15 2 2023-01-01 Yes 0
16 2 2023-02-01 Yes 0
17 2 2023-03-01 Yes 1
18 2 2023-03-02 Yes 1
19 2 2023-03-03 Yes 1
20 2 2023-03-04 Yes 1
21 2 2023-04-01 Yes 1
22 2 2023-04-02 Yes 1
23 2 2023-04-03 Yes 1
24 2 2023-04-04 Yes 1
25 2 2023-06-12 Yes 0