Search code examples
rdplyrcasensregularexpression

select rows that contain a character but does not contain another in R


from the dataframe below

df <- data.frame(col1 = c("ap(pl)e", "or(a)ng%e", "pe%ar", "bl(u%)e", "red"),
                 col2 = c(1,3,5,4,8))
df
       col1 col2
1   ap(pl)e    1
2 or(a)ng%e    3
3     pe%ar    5
4   bl(u%)e    4
5       red    8

I want to filter rows whose values in col1 contains ( but %.

     col1 col2
1 ap(pl)e    1
2   pe%ar    5
3     red    8

So I am using case_when along with gprel. this is going to be part of the dplyr pipes.

#works
df %>%
    mutate(result = case_when((grepl("p", .[[1]]) & !grepl("r", .[[1]])) ~"Yes",
#does not work                                      TRUE~"No"))
df %>%
    mutate(result = case_when((grepl("(", .[[1]]) & !grepl("%", .[[1]])) ~"Yes",
                                      TRUE~"No"))

this does not work for % and (. is there any trick to make it work?


Solution

  • If you are wondering why your code did not work then add slashes in front of '('.

    df %>%
      mutate(result = case_when((grepl("\\(", .[[1]]) & !grepl("%", .[[1]])) ~"Yes",TRUE~"No"))
    

    Output:

           col1 col2 result
    1   ap(pl)e    1    Yes
    2 or(a)ng%e    3     No
    3     pe%ar    5     No
    4   bl(u%)e    4     No
    5       red    8     No