Search code examples
python-3.xunicodeutf-8unidecoder

How to customize unidecode?


I'm using unidecode module for replacing utf-8 characters. However, there are some characters, for example greek letters and some symbols like Å, which I want to preserve. How can I achieve this?

For example,

from unidecode import unidecode
test_str = 'α, Å ©'
unidecode(test_str)

gives the output a, A (c), while what I want is α, Å (c).


Solution

  • Run unidecode on each character individually. Have a whitelist set of characters that you use to bypass the unidecode.

    >>> import string
    >>> whitelist = set(string.printable + 'αÅ')
    >>> test_str = 'α, Å ©'
    >>> ''.join(ch if ch in whitelist else unidecode.unidecode(ch) for ch in test_str)
    'α, Å (c)'