Search code examples
rregexstringgrepl

Regex string match words pattern


I have this pattern with antibiotics

atb <- c("acefa","ampicilin","fortum")

And this data frame

    DF1 <- structure(list(ID = 1:3, Text = c("Person 1 take acefa and ampicilin", "fortum and acefa are antibiotics", "Person 3 has no antibiotics but ampicilin")), class = "data.frame", row.names = c(NA, -3L))

DF1
    
    ID                                      Text
    1           Person 1 take acefa and ampicilin
    2            fortum and acefa are antibiotics
    3   Person 3 has no antibiotics but ampicilin

And I would like to get this

DF1
        
    ID                                      Text        atb
    1           Person 1 take acefa and ampicilin      c("acefa","ampicilin")
    2            fortum and acefa are antibiotics      c("fortum","acefa")
    3   Person 3 has no antibiotics but ampicilin      ampicilin

I tried

DF1%>%
mutate(atb = regmatches(Text, regexec(atb, Text)))

and

DF1%>%
mutate(atb =  str_extract_all(Text, atb)))

But it does not work.

However, it works with grepl like this

DF1%>%
    mutate(atb =  grepl(atb, Text))) 

Could I get column with words from pattern?


Solution

  • Set up the regular expression and use strapplyc:

    library(dplyr)
    library(gsubfn)
    
    result <- DF1 %>% 
      mutate(atb = strapplyc(Text, paste(atb, collapse = "|")))
    
    str(result$atb)
    

    giving:

    List of 3
     $ : chr [1:2] "acefa" "ampicilin"
     $ : chr [1:2] "fortum" "acefa"
     $ : chr "ampicilin"