I have a data set with multiple observations of a clinical outcome for each patient. The length between these time points is variable. I want to create a "roving baseline score" variable for the clinical outcome, that changes every time an increase or a decrease of the score is confirmed at the first time point that is at least 90 days ahead.
I have created an example data frame with observations for two patients (patient 1 has 6 observations, patient 2 only has 2 observations):
library(tidyverse)
df <- tibble(
patID = c("1", "1", "1", "1", "1", "1", "2", "2", "3", "3", "3", "3", "3", "3"),
time_point = c("1", "2", "3", "4", "5", "6", "1", "2", "1", "2", "3", "4", "5", "6"),
date = as.Date(c("2020-01-01", "2020-05-01", "2020-06-01", "2020-09-01", "2021-01-01", "2021-05-01", "2020-01-01", "2020-05-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-06-01", "2020-10-01", "2021-04-01")),
score = c("300", "100", "100", "100", "200", "200", "600", "400", "300", "200", "200", "100", "400", "300"))
The vector with the baseline score should result in the following:
baseline = c("300", "300", "300", "100", "100", "200", "600", "600", "300", "300", "300", "200", "200", "300")
I have tried creating the variable using a for loop, but I don't know how to make it such that it is done for each patient separately.
EDIT: I tried to write the code as following using sapply, but I get the error "argument is of length 0"
df_baseline <- df %>%
group_by(patID) %>%
mutate(baseline = lag(score, default = first(score)),
closest_index_90 = sapply(1:n(), function(i) {
valid_indices <- which(date[i]+90 <= date[(i+1):n()])
if (length(valid_indices)>0) {return(first(valid_indices) + i)}
else {return(NA)}
}),
baseline = sapply(1:n(), function(i) {
if (score[i:closest_index_90[i]] > baseline[i - 1] | score[i:closest_index_90[i]] < baseline[i - 1]) {return(score[which.min(abs(score[i:closest_index_90[i]]-baseline[i-1]))])}
else {return(baseline[i-1])}}))
I am relatively new to R so my skills with for loops and sapply functions are limited. I have looked through many questions asked on here but none seemed to answer my question specifically. Any help would be greatly appreciated!
Found a solution using purrr::accumulate2
df %>%
group_by(patID) %>%
mutate(
closest_index_90 = sapply(1:n(), function(i) {
valid_indices <- which(date[i]+90 <= date[(i+1):n()])
if (length(valid_indices)>0) {return(first(valid_indices) + i)}
else {return(NA)}
}),
baseline = accumulate2(
time_point,
replace_na(closest_index_90, max(i)),
\(b, i, r) {
if (b>max(score[i:r])) {max(score[i:r]}
else if (b<min(score[i:r])) {min(score[i:r]}
else b
}, .init = score[[1]]
)[-1])