Here is a vector of type string:
a<-c("Recherche impliquant la personne humaine (RIPH) Médicaments 3",
"Recherche impliquant la personne humaine (RIPH) Hors Produits de santé 3",
"Recherche impliquant la personne humaine (RIPH) dispositif médical 1")
I want to identify all element containing some keywords:
I firstly identify all element containing the word "Recherche"
grepl("recherche",a,ignore.case = TRUE)
[1] TRUE TRUE TRUE
Now I want to identify only elements containing all these keywords at the same time:
c("recherche", "impliquant", "personne", "humaine", "3")
The result must be
[1] TRUE TRUE FALSE
I tried this:
grepl(c("Recherche,impliquant , personne, humaine, 3"),a)
but it didn't work, cause the output is that:
FALSE FALSE FALSE
You can do it using multiple lookaheads (?=...)
, where each lookahead asserts the presence anywhere in the string of a keyword; (?i)
is used to make the matching case-insensitive:
grep("(?i)(?=.*recherche.*)(?=.*impliquant.*)(?=.*personne.*)(?=.*humaine.*)(?=.*3.*).*",
a,
value = TRUE,
perl = TRUE)
[1] "Recherche impliquant la personne humaine (RIPH) Médicaments 3"
[2] "Recherche impliquant la personne humaine (RIPH) Hors Produits de santé 3"
This method obviously also works with grepl
; just omit `value = TRUE:
grepl("(?i)(?=.*recherche.*)(?=.*impliquant.*)(?=.*personne.*)(?=.*humaine.*)(?=.*3.*).*",
a,
perl = TRUE)
[1] TRUE TRUE FALSE