I have a table of doctor visits wherein there are sometimes multiple records for the same encounter key (Enc_Key
) if there are multiple diagnoses, such as:
Enc_Key | Patient_Key | Enc_Date | Diag
123 789 20160512 asthma
123 789 20160512 fever
123 789 20160515 coughing
546 013 20160226 flu
564 444 20160707 laceration
789 226 20160707 asthma
789 226 20160707 fever
I am trying to create an indicator variable Diag_Ind
based on the value of the character variable Diag
, but I need to apply it for the entire encounter. In other words, if I get a value of "asthma" for Diag
for a record, then I want to apply a "1" for the Diag_Ind
to every record that has the same Enc_Key
, such as below:
Enc_Key | Patient_Key | Enc_Date | Diag | Diag_Ind
123 789 20160512 asthma 1
123 789 20160512 fever 1
123 789 20160515 coughing 1
546 013 20160226 flu 0
564 444 20160707 laceration 0
789 226 20160707 asthma attack 1
789 226 20160707 fever 1
I can't seem to figure out a way to apply this binary indicator to multiple records. I have been using a line of code that resembles this:
tbl$Diag_Ind <- ifelse(grepl('asthma',tolower(tbl$Diag)),1,0)
but this would only assign a value of "1" to the single record with that Diag value, such as this:
Enc_Key | Patient_Key | Enc_Date | Diag | Diag_Ind
123 789 20160512 asthma 1
123 789 20160512 fever 0
123 789 20160515 coughing 0
546 013 20160226 flu 0
564 444 20160707 laceration 0
789 226 20160707 asthma attack 1
789 226 20160707 fever 0
I'm unsure of how to apply it to the rest of the records with the same Enc_Key
value
We can use base R ave
to check if any value in every group of Enc_Key
has asthma
in it
df$Diag_Ind<- ave(df$Diag, df$Enc_Key,FUN=function(x) as.integer(any(grep("asthma", x))))
df
# Enc_Key Patient_Key Enc_Date Diag Diag_Ind
#1 123 789 20160512 asthma 1
#2 123 789 20160512 fever 1
#3 123 789 20160515 coughing 1
#4 546 13 20160226 flu 0
#5 564 444 20160707 laceration 0
#6 789 226 20160707 asthma 1
#7 789 226 20160707 fever 1
Similar solution with dplyr
library(dplyr)
df %>%
group_by(Enc_Key) %>%
mutate(Diag_Ind = as.numeric(any(grep("asthma", Diag))))
# Enc_Key Patient_Key Enc_Date Diag Diag_Ind
# (int) (int) (int) (fctr) (dbl)
#1 123 789 20160512 asthma 1
#2 123 789 20160512 fever 1
#3 123 789 20160515 coughing 1
#4 546 13 20160226 flu 0
#5 564 444 20160707 laceration 0
#6 789 226 20160707 asthma 1
#7 789 226 20160707 fever 1