Search code examples
rregexspecial-characters

Regular expression for non-english characters


I need to check if some strings contain any non-English characters.

x = c('Kält', 'normal', 'normal with, punctuation ~-+!', 'normal with number 1234')
grep(pattern = ??, x) # Expected output:1

Solution

  • You may use [^[:ascii:]] PCRE regex:

    x = c('Kält', 'normal', 'normal with, punctuation ~-+!', 'normal with number 1234')
    grep(pattern = "[^[:ascii:]]", x, perl=TRUE) 
    grep(pattern = "[^[:ascii:]]", x, value=TRUE, perl=TRUE) 
    

    Ouput:

    [1] 1
    [1] "Kält"
    

    See the R demo