Search code examples
pythonpython-3.xunicodeascii

How to convert fancy/artistic unicode text to ASCII?


I have a unicode string like "𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊" and would like to convert it to the ASCII form "thug life".

I know I can achieve this in Python by

import unidecode
print(unidecode.unidecode('𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊'))
// thug life

However, this would asciify also other unicode characters (such as Chinese/Japanese characters, emojis, accented characters, etc.), which I want to preserve.

Is there a way to detect these type of "artistic" unicode characters?

Some more examples:

𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮

𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒

𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖

thug life

Thanks for your help!


Solution

  • import unicodedata
    strings = [
      '𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊',
      '𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮',
      '𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒',
      '𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖',
      'thug life']
    for x in strings:
      print(unicodedata.normalize( 'NFKC', x), x)
    

    Output: .\62803325.py

    thug life 𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊
    thug life 𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮
    thug life 𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒
    thug life 𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖
    thug life thug life
    

    Resources: