Search code examples
rregexregular-language

R to detect accent


Is there a way using grepl or another function to detect all words that have accent? Not ignoring it, which has been ask many times, just to detect all the words that have any accent in it.

Thanks


Solution

  • Another solution - detect non-ASCII characters:

    library(stringr)
    str_detect(txt, "[^ -~]")
    [1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
    

    where [^ -~] is a negated character class for ASCII characters (so, without negation, [ -~] matches any ASCII characters)

    Or, using dplyr syntax:

    library(dplyr)
    library(stringr)
    data.frame(txt) %>%
      filter(str_detect(txt, "[^ -~]"))
             txt
    1 aaaaaaaaaä
    2   cccccccç
    3    ccccccč
    4     nnnnnñ
    5       ynàn
    

    Data:

    txt <- c("aaaaaaaaaä", "cccccccç", "ccccccč", "abc", "nnnnnñ", "xXXXz", "ynàn")