Given:
str1 = "é" # Latin accent
str2 = "囧" # Chinese character
str3 = "ジ" # Japanese character
str4 = "e" # English character
How to differentiate str1
(Latin accent characters) from rest of the strings?
Update:
Given
str1 = "\xE9" # Latin accent é actually stored as \xE9 reading from a file
How would the answer be different?
I would first strip out all plain ASCII characters with gsub
, and then check with a regex to see if any Latin characters remain. This should detect the accented latin characters.
def latin_accented?(str)
str.gsub(/\p{Ascii}/, "") =~ /\p{Latin}/
end
latin_accented?("é") #=> 0 (truthy)
latin_accented?("囧") #=> nil (falsy)
latin_accented?("ジ") #=> nil (falsy)
latin_accented?("e") #=> nil (falsy)