Search code examples
rdplyrconditional-statementslagrolling-computation

Mutate new column based on moving window of fixed date interval size, in R


I have data for one patient in R, which shows dates when they tested positive for a certain condition. The data looks like this:

      date  positive
2005-02-22      yes
2005-04-26       no
2005-08-02      yes
2005-10-04       no
2005-12-06      yes
2006-03-14       no
2006-06-06       no
2006-09-12      yes
2006-12-19      yes
2007-03-27      yes

Now I introduce a new definition. The patient's condition is defined as "chronic positive" if "current test is positive, and >=50% of tests in the prior 365 days were positive". So I want to create an output dataset that tells me whether the patient was chronically positive at each date. For example, the output should look like this (e.g. on 2006-09-12, they are "positive" but not "chronic positive" because 3 out of 4 visits in the previous 365 days were negative):

      date  positive  chronic
2005-02-22      yes        no
2005-04-26       no        no
2005-08-02      yes       yes
2005-10-04       no        no
2005-12-06      yes       yes
2006-03-14       no        no
2006-06-06       no        no
2006-09-12      yes        no
2006-12-19      yes        no
2007-03-27      yes       yes

How can I do this? At each row of interest, I need to be able to look at previous rows (within the last 365 days) and assess what proportion of them were positive. I think I could use a combination of the lead/lag functions and dplyr, but I would appreciate an example of how this can be done.

The original data can be reproduced with:

dat <- structure(list(date = structure(c(12836, 12899, 12997, 13060, 13123, 13221, 13305, 13403, 13501, 13599), class = "Date"), 
                      positive = c("yes", "no", "yes", "no", "yes", "no", "no", "yes", "yes", "yes")), 
                 row.names = c(NA, 10L), class = "data.frame")

Solution

  • Here is one way -

    library(dplyr)
    library(purrr)
    
    dat %>%
      mutate(chronic = map_chr(row_number(), ~{
        inds <- between(date, date[.x] - 365, date[.x] - 1)
        if(positive[.x] == "yes" && any(inds) && mean(positive[inds] == 'yes') >= 0.5) 'yes' else 'no'
        }))
    
    #         date positive chronic
    #1  2005-02-22      yes      no
    #2  2005-04-26       no      no
    #3  2005-08-02      yes     yes
    #4  2005-10-04       no      no
    #5  2005-12-06      yes     yes
    #6  2006-03-14       no      no
    #7  2006-06-06       no      no
    #8  2006-09-12      yes      no
    #9  2006-12-19      yes      no
    #10 2007-03-27      yes     yes