Search code examples
rubystringnon-ascii-characters

How to check if a string contains accented Latin characters like é in Ruby?


Given:

str1 = "é"   # Latin accent
str2 = "囧"  # Chinese character
str3 = "ジ"  # Japanese character
str4 = "e"   # English character

How to differentiate str1 (Latin accent characters) from rest of the strings?

Update:

Given

str1 = "\xE9" # Latin accent é actually stored as \xE9 reading from a file

How would the answer be different?


Solution

  • I would first strip out all plain ASCII characters with gsub, and then check with a regex to see if any Latin characters remain. This should detect the accented latin characters.

    def latin_accented?(str)
      str.gsub(/\p{Ascii}/, "") =~ /\p{Latin}/
    end
    
    latin_accented?("é")  #=> 0 (truthy)
    latin_accented?("囧") #=> nil (falsy)
    latin_accented?("ジ") #=> nil (falsy)
    latin_accented?("e")  #=> nil (falsy)