Is there a way to filter rows based on match on 2 strings. For eg I want to get all the rows with name that contain won
and le
.
df <- data.frame(name = c("Cathy Wu","won Xion le","Matt le won","stephen leuig"),
value = 5:4)
name value
<chr> <int>
Cathy le 5
won Xion le 6
Matt le won 7
stephen won 8
James Matt 9
The output that I am looking for is;
name value
<chr> <int>
won Xion le 6
Matt le won 7
If I try df %>% filter(str_detect(name,"won|le"))
then the result is as follows, as here it is doing an or
(|
)
name value
<chr> <int>
Cathy le 5
won Xion le 6
Matt le won 7
stephen won 8
What I am looking for is something like "won&&le"
. Can I achieve this using str_detect
.
Here are a few different ways of doing it:
filter(df, str_detect(name, "won"), str_detect(name, "le")) # using multiple str_detect calls
filter(df, str_detect(name, "(?=.*won)(?=.*le)")) # using lookaheads
filter(df, str_detect(name,"won.*le|le.*won")) # jared's first answer
filter(df, str_detect(name, "won") & str_detect(name, "le")) # another way similar to #1
To match the word, and not match the strings as part of larger words, as Jared commented, you can add a '\b' on either side of each word you're looking for, e.g.:
filter(df, str_detect(name, "(?=.*\\bwon\\b)(?=.*\\ble\\b)"))