Search code examples
rtextcpu-wordsentence

Filter and remove sentences that begin and end with a certain word in R


I am trying to filter words that begin with a certain word and end with it. I have some text data, for example:

data <- c("No comment", "Nothing", "No clue", "No", "No", "I have no clue", "Noe")

Now I want to detect sentences that begin and end with a no. I tried

str_detect(data, "^No", negate = FALSE)

but obviously also Sentence 1, 3 and surprisingly also sentence 7 gets detected.

I don't know how to tell R to only detect the sentence if and only if it begins AND ends with the word "No".

Does anybody has an Idea? I am new here so I hope my problem description was satisfying.

Looking forward to hear from you all!


Solution

  • data <- c("No comment", "Nothing", "No clue", "No", "No", "I have no clue", "Noe")
    data <- c(data, "No and No", "No and YesNo")
    grepl("^No(.*\\bNo)?$", data)
    # [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE
    

    If the "YesNo" should indeed match, then remove the \\b from the regex.

    Regex:

    • ^No - starts with the literal No;
    • (...)?$ - optional match at the end of the string; this means that both "No" and "No something No" will match;
    • .*\\bNo - anything following by a word-boundary and the literal No