Search code examples
rstringmatching

Return matched pattern from vector - R


I'm trying to write code which adds a new column to a dataframe, returning a pattern which has been matched to the respective cell in a different column.

For example, I have a column where the values are a string with a mix of useful and non-useful information, like this:

data.frame(A = c("148apple32394", "386pear3", "23banana3808"))

A
1 148apple32394
2      386pear3
3  23banana3808

I would like to compare this column to a vector of possible patterns, ie:

patterns <- c("apple", "banana", "pear")

and return a new column containing whatever pattern matched, the end result being:

A                B
1 148apple32394  apple
2      386pear3  pear
3  23banana3808  banana

I know grep doesn't work well with vectors of patterns, so is there another good function which might work? Ideally I would like to implement the solution using mutate()

Thanks!


Solution

  • You could use str_extract with the patterns that are collapsed by | to detect and extract the patterns like this:

    df = data.frame(A = c("148apple32394", "386pear3", "23banana3808"))
    patterns <- c("apple", "banana", "pear")
    library(dplyr)
    library(stringr)
    df %>%
      mutate(B = str_extract(A, paste(patterns, collapse = "|")))
    #>               A      B
    #> 1 148apple32394  apple
    #> 2      386pear3   pear
    #> 3  23banana3808 banana
    

    Created on 2023-03-10 with reprex v2.0.2