I have data for one patient in R, which shows dates when they tested positive for a certain condition. The data looks like this:
date positive
2005-02-22 yes
2005-04-26 no
2005-08-02 yes
2005-10-04 no
2005-12-06 yes
2006-03-14 no
2006-06-06 no
2006-09-12 yes
2006-12-19 yes
2007-03-27 yes
Now I introduce a new definition. The patient's condition is defined as "chronic positive" if "current test is positive, and >=50% of tests in the prior 365 days were positive". So I want to create an output dataset that tells me whether the patient was chronically positive at each date. For example, the output should look like this (e.g. on 2006-09-12, they are "positive" but not "chronic positive" because 3 out of 4 visits in the previous 365 days were negative):
date positive chronic
2005-02-22 yes no
2005-04-26 no no
2005-08-02 yes yes
2005-10-04 no no
2005-12-06 yes yes
2006-03-14 no no
2006-06-06 no no
2006-09-12 yes no
2006-12-19 yes no
2007-03-27 yes yes
How can I do this? At each row of interest, I need to be able to look at previous rows (within the last 365 days) and assess what proportion of them were positive. I think I could use a combination of the lead
/lag
functions and dplyr
, but I would appreciate an example of how this can be done.
The original data can be reproduced with:
dat <- structure(list(date = structure(c(12836, 12899, 12997, 13060, 13123, 13221, 13305, 13403, 13501, 13599), class = "Date"),
positive = c("yes", "no", "yes", "no", "yes", "no", "no", "yes", "yes", "yes")),
row.names = c(NA, 10L), class = "data.frame")
Here is one way -
library(dplyr)
library(purrr)
dat %>%
mutate(chronic = map_chr(row_number(), ~{
inds <- between(date, date[.x] - 365, date[.x] - 1)
if(positive[.x] == "yes" && any(inds) && mean(positive[inds] == 'yes') >= 0.5) 'yes' else 'no'
}))
# date positive chronic
#1 2005-02-22 yes no
#2 2005-04-26 no no
#3 2005-08-02 yes yes
#4 2005-10-04 no no
#5 2005-12-06 yes yes
#6 2006-03-14 no no
#7 2006-06-06 no no
#8 2006-09-12 yes no
#9 2006-12-19 yes no
#10 2007-03-27 yes yes