I am trying to create a function that takes overlapping periods into account in this ordered data by id, start_ins, rxdate:
dt <- data.table(
id = c(1, 2, 2, 3, 3, 3),
start_ins = as.Date(c("2000-10-12", "2000-03-31", "2000-03-31", "2000-01-11", "2000-01-11", "2000-01-11")),
rxdate = as.Date(c("2010-01-04", "2010-03-30", "2010-04-08", "2003-12-29", "2004-01-12", "2004-03-10")),
amount = c(30, 28, 100, 60, 1, 10),
rx_end = as.Date(c("2010-02-03", "2010-04-27", "2010-07-17", "2004-02-27", "2004-01-13", "2004-03-20"))
)
In this set, observations are grouped by id and start_ins. People also purchase Rx on a date and the number of pills (amount). rx_end is
[, rx_end := rxdate + amount]
However, if you look closely, you will find that rx_end for line 2 comes after rxdate for line 3. In this case, there is an overlap between rxdate line 3 and end of previous prescription. I want to assume that a person finished a rx before starting a new rx. In this case rxdate for line 3 should start the day after rx_end of line 2 AND rx_end for line 3 should be updated to take into account the new rxdate in line 3. Same logic applies for lines 4 and 5. In this case, the new data set would look something like this:
dt1 <- data.table(
id = c(1, 2, 2, 3, 3, 3),
start_ins = as.Date(c("2000-10-12", "2000-03-31", "2000-03-31", "2000-01-11", "2000-01-11", "2000-01-11")),
rxdate = as.Date(c("2010-01-04", "2010-03-30", "2010-04-28", "2003-12-29", "2004-02-28", "2004-03-10")),
amount = c(30, 28, 100, 60, 1, 10),
rx_end = as.Date(c("2010-02-03", "2010-04-27", "2010-08-06", "2004-02-27", "2004-02-29", "2004-03-20"))
)
I tried writing this function using data.table:
overlapper <- function (dt){
dt[,{
prev_end_date <- shift(rx_end, 1, type = "lag")
for (i in 2:.N){
if(!is.na(prev_end_date[i-1]) & prev_end_date >= dt[i, rxdate]){
dt[i, rxdate := prev_end_date[i-1] +1]
dt[i, rx_end := rxdate + amount]
}
prev_end_date[i] <- dt[i, rx_end]
}
return(dt)}, by = .(id, start_ins)]}
Keep in mind that some people have only one observation, so these should not be taken into account and that is why I am doing a grouped operation by id and start_ins. I always want to look at line n and compare it to n-1 to update line n ONLY if there is an overlap. If there is no over lap, I would keep the line as is and move on to the next line.
You can do this using data.table::shift()
.
rx_days
.<
to <=
).rx_days
.Here are steps 1-4, which work with your sample data:
dt[, c("rxdate", "rx_end") := {
i <- seq(.N) # never replace first row
shifted_rx_end <- shift(rx_end, 1, fill = rxdate[1])
shift_dates <- fifelse(i == 1, FALSE, rxdate <= shifted_rx_end)
rxdate <- fifelse(shift_dates, shifted_rx_end + 1, rxdate)
rx_end[shift_dates] <- rxdate[shift_dates] + amount[shift_dates]
.(rxdate, rx_end)
}, .(id, start_ins)]
identical(dt, dt1)
# [1] TRUE
However, imagine this new data frame:
id start_ins rxdate amount rx_end
<num> <Date> <Date> <num> <Date>
1: 1 2000-10-12 2010-01-04 30 2010-02-03
2: 2 2000-03-31 2010-03-30 28 2010-04-27
3: 2 2000-03-31 2010-04-08 100 2010-07-17
4: 2 2000-03-31 2010-07-20 10 2010-07-30 # new row
5: 2 2000-03-31 2010-08-15 17 2010-09-01 # new row
6: 3 2000-01-11 2003-12-29 60 2004-02-27
7: 3 2000-01-11 2004-01-12 1 2004-01-13
8: 3 2000-01-11 2004-03-10 10 2004-03-20
If we apply this approach, row 3 will update correctly but row 4 will not reflect the changes in row 3, and so does not trigger what should be an update in row 5.
To resolve this we can put it in a recursive function that calls itself until there are no more changes:
shift_dates <- function(dt, grp = c("id", "start_ins")) {
old_state <- dt[, .(rxdate, rx_end)]
dt[, c("rxdate", "rx_end") := {
i <- seq(.N) # never replace first row
shifted_rx_end <- shift(rx_end, 1, fill = rxdate[1])
shift_dates <- fifelse(i == 1, FALSE, rxdate <= shifted_rx_end)
rxdate <- fifelse(shift_dates, shifted_rx_end + 1, rxdate)
rx_end[shift_dates] <- rxdate[shift_dates] + amount[shift_dates]
.(rxdate, rx_end)
}, grp]
if (identical(dt[, .(rxdate, rx_end)], old_state)) {
return(dt)
}
shift_dates(dt)
}
This means the changes from previous rows will propagate until complete:
shift_dates(dt2)
id start_ins rxdate amount rx_end
<num> <Date> <Date> <num> <Date>
1: 1 2000-10-12 2010-01-04 30 2010-02-03
2: 2 2000-03-31 2010-03-30 28 2010-04-27
3: 2 2000-03-31 2010-04-28 100 2010-08-06
4: 2 2000-03-31 2010-08-07 10 2010-08-17 # change propagates
5: 2 2000-03-31 2010-08-18 17 2010-09-04 # change propagates
6: 3 2000-01-11 2003-12-29 60 2004-02-27
7: 3 2000-01-11 2004-02-28 1 2004-02-29
8: 3 2000-01-11 2004-03-10 10 2004-03-20
dt2 <- data.table(
id = c(1, 2, 2, 2, 2, 3, 3, 3),
start_ins = as.Date(c("2000-10-12", "2000-03-31", "2000-03-31", "2000-03-31", "2000-03-31", "2000-01-11", "2000-01-11", "2000-01-11")),
rxdate = as.Date(c("2010-01-04", "2010-03-30", "2010-04-08", "2010-07-20", "2010-08-15", "2003-12-29", "2004-01-12", "2004-03-10")),
amount = c(30, 28, 100, 10, 17, 60, 1, 10),
rx_end = as.Date(c("2010-02-03", "2010-04-27", "2010-07-17", "2010-07-30", "2010-09-01", "2004-02-27", "2004-01-13", "2004-03-20"))
)