Search code examples
rstringrstringi

detect duplicated words within string


In the string below (which is a column in a df) I want to extract strings in which TRUE is present at least two times. I guess I could do some strsplit and then detect duplicates, but is there a method to do it directly?

head(df$Filter)
[1] "FALSE_TRUE_FALSE_FALSE" "FALSE_TRUE_FALSE_FALSE" "FALSE_TRUE_TRUE_FALSE"  "FALSE_TRUE_FALSE_FALSE" "FALSE_TRUE_FALSE_FALSE"
[6] "FALSE_TRUE_FALSE_FALSE"

out in this example:

FALSE_TRUE_TRUE_FALSE

Solution

  • We can use str_count

    library(dplyr)
    library(stringr)
    df %>%
        filter(str_count(Filter, "TRUE") > 1)