Search code examples
rtidyverselubridate

Keep rows that are within specific interval for different conditions and grouped by


Here's a reprex for illustration.

library(tidyverse)

set.seed(1337)
df <- tibble(
  date_visit = sample(seq(as.Date("2020/01/01"),
    as.Date("2021/01/01"),
    by = "day"
  ), 400, replace = T),
  patient_id = as.factor(paste("patient", sample(seq(1, 13), 400, replace = T), sep = "_")),
  type_of_visit = as.factor(sample(c("medical", "veterinary"), 400, replace = T))
)

What I'm trying to do create a dataframe where I keep the patient_id (group by, I assume), and the visit types if that patient has done 2 different visits in less than 24 hours. Or adding a variable that says True/False if that condition is met.

I tried to use a left join by patient_id to work with 2 different variables but that takes too much computing time (my original DF is much longer than this)

Can someone point me in the right direction?

Thank you


Solution

  • Maybe this will help -

    library(dplyr)
    
    df %>%
      group_by(patient_id, date_visit) %>%
      summarise(flag = n_distinct(type_of_visit) >= 2) %>%
      summarise(flag = any(flag))
    
    #  patient_id flag 
    #   <fct>      <lgl>
    # 1 patient_1  TRUE 
    # 2 patient_10 FALSE
    # 3 patient_11 TRUE 
    # 4 patient_12 FALSE
    # 5 patient_13 FALSE
    # 6 patient_2  FALSE
    # 7 patient_3  FALSE
    # 8 patient_4  FALSE
    # 9 patient_5  TRUE 
    #10 patient_6  FALSE
    #11 patient_7  TRUE 
    #12 patient_8  TRUE 
    #13 patient_9  TRUE 
    

    If you want to keep all the rows for those patient id's

    df %>%
      group_by(patient_id, date_visit) %>%
      summarise(flag = n_distinct(type_of_visit) >= 2) %>%
      filter(any(flag))