I have an Unicode string and need to translate it into pure ASCII.
t = "\xf0\x9d\x97\x94\xf0\x9d\x98\x82\xf0\x9d\x97\xb4\xf0\x9d\x98\x82\xf0\x9d\x98\x80\xf0\x9d\x98\x81"
My first try was unsuccessful:
t.encode('ASCII', invalid: :replace, undef: :replace, replace: '')
=> ""
Translated the string using unicode normalization:
t.unicode_normalize :nfkd
=> "August"
Is there a better solution? It should be gem-independent and work with Ruby 2.x (String#unicode_normalize
is unavailable on 2.1 and earlier versions).
You could translate the Unicode characters to their ASCII equivalents via tr
:
t.tr("𝗔-𝗭𝗮-𝘇", 'A-Za-z')
#=> "August"
or, using their codepoints:
t.tr("\u{1D5D4}-\u{1D5ED}\u{1D5EE}-\u{1D607}", "A-Za-z")
#=> "August"
Make sure that t
is UTF-8 encoded.
Also note that there are other stylizes forms in the Mathematical Alphanumeric Symbols block which you might want to translate accordingly.