Search code examples
pythonstringencodingcharacter-encodingdecoding

Replace language specific characters in python with English letters


Is there any way in Python 3 to replace general language specific characters for English letters?
For example, I've got function get_city(IP), that returns city name connected with given IP. It connects to external database, so I can't change the way it encodes, I am just getting value from database.
I would like to do something like:

city = "České Budějovice"
city = clear_name(city)
print(city) #should return "Ceske Budejovice"

Here I used Czech language, but in general it should work on any non Asian langauge.


Solution

  • Try unidecode:

    # coding=utf-8
    from unidecode import unidecode
    
    city = "České Budějovice"
    print(unidecode(city))
    

    Prints Ceske Budejovice as desired (assuming your post has a typo).

    Note: if you're using Python 2.x, you'll need to decode the string before passing it to unidecode, e.g. unidecode(city.decode('utf-8'))