Search code examples
rbinaryrecordsindicator

Binary variable to multiple records of same key based on characters in another field in R


I have a table of doctor visits wherein there are sometimes multiple records for the same encounter key (Enc_Key) if there are multiple diagnoses, such as:

Enc_Key | Patient_Key |   Enc_Date   | Diag
  123         789         20160512      asthma
  123         789         20160512      fever
  123         789         20160515      coughing
  546         013         20160226      flu      
  564         444         20160707      laceration
  789         226         20160707      asthma
  789         226         20160707      fever

I am trying to create an indicator variable Diag_Ind based on the value of the character variable Diag, but I need to apply it for the entire encounter. In other words, if I get a value of "asthma" for Diag for a record, then I want to apply a "1" for the Diag_Ind to every record that has the same Enc_Key, such as below:

Enc_Key | Patient_Key |   Enc_Date   | Diag            | Diag_Ind
  123         789         20160512      asthma             1
  123         789         20160512      fever              1
  123         789         20160515      coughing           1
  546         013         20160226      flu                0     
  564         444         20160707      laceration         0
  789         226         20160707      asthma attack      1
  789         226         20160707      fever              1

I can't seem to figure out a way to apply this binary indicator to multiple records. I have been using a line of code that resembles this:

tbl$Diag_Ind <- ifelse(grepl('asthma',tolower(tbl$Diag)),1,0)

but this would only assign a value of "1" to the single record with that Diag value, such as this:

Enc_Key | Patient_Key |   Enc_Date   | Diag            | Diag_Ind
  123         789         20160512      asthma             1
  123         789         20160512      fever              0
  123         789         20160515      coughing           0
  546         013         20160226      flu                0     
  564         444         20160707      laceration         0
  789         226         20160707      asthma attack      1
  789         226         20160707      fever              0

I'm unsure of how to apply it to the rest of the records with the same Enc_Key value


Solution

  • We can use base R ave to check if any value in every group of Enc_Key has asthma in it

    df$Diag_Ind<- ave(df$Diag, df$Enc_Key,FUN=function(x) as.integer(any(grep("asthma", x))))
    
    df
    #  Enc_Key Patient_Key Enc_Date       Diag Diag_Ind
    #1     123         789 20160512     asthma        1
    #2     123         789 20160512      fever        1
    #3     123         789 20160515   coughing        1
    #4     546          13 20160226        flu        0
    #5     564         444 20160707 laceration        0
    #6     789         226 20160707     asthma        1
    #7     789         226 20160707      fever        1
    

    Similar solution with dplyr

    library(dplyr)
    df %>%
     group_by(Enc_Key) %>%
     mutate(Diag_Ind = as.numeric(any(grep("asthma", Diag))))
    
    #   Enc_Key Patient_Key Enc_Date    Diag    Diag_Ind
    #    (int)       (int)    (int)     (fctr)    (dbl)
    #1     123         789 20160512     asthma        1
    #2     123         789 20160512      fever        1
    #3     123         789 20160515   coughing        1
    #4     546          13 20160226        flu        0
    #5     564         444 20160707 laceration        0
    #6     789         226 20160707     asthma        1
    #7     789         226 20160707      fever        1