Search code examples
rrepeatlongitudinal

How to filter repeated data based on the number of days between dates in R


I have longitudinal patient data in R. I would like to subset patients in the patid column based on the observation_date where I only keep patients that have the second observation_date occur at least 48 days apart or more. Noting that observation_date may be more than 2 dates.

Table1:

patid observation_date
1 07/07/2016
1 07/07/2019
2 07/05/2015
2 02/12/2016
3 07/05/2015
3 07/06/2015
4 07/05/2015
4 02/12/2016

Solution

  • Reports the diff in days and filtered out the ones that has less than 48 days diff

    library(tidyverse)
    library(lubridate)
    
    df <- read_table("patid observation_date
    1   07/07/2016
    1   07/07/2019
    1   07/07/2020
    2   07/05/2015
    2   02/12/2016
    3   07/05/2015
    3   07/06/2015
    4   07/05/2015
    4   02/12/2016") %>% 
      mutate(observation_date = observation_date %>% 
               as.Date("%m/%d/%Y"))
    
    df %>%  
      group_by(patid) %>%  
      summarise(diff = interval(first(observation_date), 
                                nth(observation_date, 2)) %>% # Select the second observation
                  as.numeric("days")) %>% 
      filter(diff >= 48)
    
    # A tibble: 3 x 2
      patid  diff
      <dbl> <dbl>
    1     1  1095
    2     2   222
    3     4   222