Search code examples
pythonunicodeencodingutf-8

Python UTF-8 Lowercase Turkish Specific Letter


with using python 2.7:

>myCity = 'Isparta'
>myCity.lower()
>'isparta'
#-should be-
>'ısparta'

tried some decoding, (like, myCity.decode("utf-8").lower()) but could not find how to do it.

how can lower this kinds of letters? ('I' > 'ı', 'İ' > 'i' etc)

EDIT: In Turkish, lower case of 'I' is 'ı'. Upper case of 'i' is 'İ'


Solution

  • Some have suggested using the tr_TR.utf8 locale. At least on Ubuntu, perhaps related to this bug, setting this locale does not produce the desired result:

    import locale
    locale.setlocale(locale.LC_ALL, 'tr_TR.utf8')
    
    myCity = u'Isparta İsparta'
    print(myCity.lower())
    # isparta isparta
    

    So if this bug affects you, as a workaround you could perform this translation yourself:

    lower_map = {
        ord(u'I'): u'ı',
        ord(u'İ'): u'i',
        }
    
    myCity = u'Isparta İsparta'
    lowerCity = myCity.translate(lower_map)
    print(lowerCity)
    # ısparta isparta
    

    prints

    ısparta isparta