Here's a reprex for illustration.
library(tidyverse)
set.seed(1337)
df <- tibble(
date_visit = sample(seq(as.Date("2020/01/01"),
as.Date("2021/01/01"),
by = "day"
), 400, replace = T),
patient_id = as.factor(paste("patient", sample(seq(1, 13), 400, replace = T), sep = "_")),
type_of_visit = as.factor(sample(c("medical", "veterinary"), 400, replace = T))
)
What I'm trying to do create a dataframe where I keep the patient_id (group by, I assume), and the visit types if that patient has done 2 different visits in less than 24 hours. Or adding a variable that says True/False if that condition is met.
I tried to use a left join by patient_id to work with 2 different variables but that takes too much computing time (my original DF is much longer than this)
Can someone point me in the right direction?
Thank you
Maybe this will help -
library(dplyr)
df %>%
group_by(patient_id, date_visit) %>%
summarise(flag = n_distinct(type_of_visit) >= 2) %>%
summarise(flag = any(flag))
# patient_id flag
# <fct> <lgl>
# 1 patient_1 TRUE
# 2 patient_10 FALSE
# 3 patient_11 TRUE
# 4 patient_12 FALSE
# 5 patient_13 FALSE
# 6 patient_2 FALSE
# 7 patient_3 FALSE
# 8 patient_4 FALSE
# 9 patient_5 TRUE
#10 patient_6 FALSE
#11 patient_7 TRUE
#12 patient_8 TRUE
#13 patient_9 TRUE
If you want to keep all the rows for those patient id's
df %>%
group_by(patient_id, date_visit) %>%
summarise(flag = n_distinct(type_of_visit) >= 2) %>%
filter(any(flag))