Search code examples
rstringcharactergsubgrepl

Remove entries from string vector containing specific characters in R


I've got two character vectors:

x = {"a", "b", "c", "kt"}
y = {"abs", "kot", "ccf", "okt", "kk", "y"}

I need to use x to remove entries from y so that only the strings that do not contain any of the x's entries remain, like this:

y = {"kot", "kk", "y"}

The code should work for any size of vectors x and y.

So far I've tried to use gsub and grepl but these only work with single strings. I've tried to create a loop to do this but the problem seems more difficult than I thought. And of course, the more sophisticated the solution is, the better, but you can assume that in this case the vectors x and y have up to 200 entries.


Solution

  • We can use grep to find out which values in y match the pattern in x and exclude them using !%in%

    y[!y %in% grep(paste0(x, collapse = "|"), y, value = T)]
    
    #[1] "kot" "kk"  "y"  
    

    Or even better with grepl as it returns boolean vectors

    y[!grepl(paste0(x, collapse = "|"), y)]
    

    A concise version with grep using invert and value parameter

    grep(paste0(x, collapse = "|"), y, invert = TRUE, value = TRUE)
    #[1] "kot" "kk"  "y"