Search code examples
rcase-when

how to use case_when and grep together to define a new varaible


I have a data that looks like this,

enter image description here

It can be build using codes:

df<-structure(list(Gender = c("M", "F", "M", "F", "F"), Location = c("Cleveland, OH", 
"New Olreans, LA", "Chicago, IL", "Strongsville, OH", "Boston, MA"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

I want to build variable" comment" as follow: enter image description here

The rule is: if Gender=="F" and we find "OH" in Location, then comment ="Female in OH" if Gender=="F" and we can't find "OH" in Location, then comment ="Female in Other" if Gender=="M" and we find "OH" in Location, then comment ="Male in OH" if Gender=="M" and we can't find "OH" in Location, then comment ="Male in Other"

So my codes are

 df<-df %>% 
     mutate(Comment = case_when(Gender=="F" & grep("OH", df$Location)~"Female in OH",
                            Gender=="F" & !grep("OH", df$Location)~ "Female in Other",                        
                            Gender=="M" & grep("OH", df$Location2)~ "Male in OH",
                            Gender=="M" & !grep("OH", df$Location)~ "Male in other)",
                            TRUE~NA))

It won't work. Could anyone give me some guidance on this?


Solution

  • Use grepl rather than grep to get boolean TRUE/FALSE values rather than the indexes. For example (as well as fixing other typos)

    df %>% 
         mutate(Comment = case_when(Gender=="F" & grepl("OH", Location)~"Female in OH",
                                Gender=="F" & !grepl("OH", Location)~ "Female in Other",                        
                                Gender=="M" & grepl("OH", Location)~ "Male in OH",
                                Gender=="M" & !grepl("OH", Location)~ "Male in other"))
    

    I took out the NA part since you covered all the cases and NA is the default value when no other matches occur. But if you need it explicitly, then you should use the typed version of NA for characters.

    df %>% 
      mutate(Comment = case_when(Gender=="F" & grepl("OH", Location)~"Female in OH",
                                 Gender=="F" & !grepl("OH", Location)~ "Female in Other",                        
                                 Gender=="M" & grepl("OH", Location)~ "Male in OH",
                                 Gender=="M" & !grepl("OH", Location)~ "Male in other",
                                 TRUE~NA_character_))