Search code examples
pythonpython-3.xutf-8decode

Decode byte string to Cyrillic in Python


I have a byte string like this, it should be Сравнение in Cyrillic characters:

a = b'Сравнение'

Decoding it into UTF-8 doesn't help:

a = b'Сравнение'
a.decode("utf-8") # prints same Ср... string

Which encoding is this and how to decode the string?

I'm using Google Colab with Python 3.10.12.

This online decoder after applying auto-decode says it must be decoded from UTF-8 to UTF-8.


Solution

  • You can use html.unescape:

    import html
    
    a = b"Сравнение"
    decoded_string = html.unescape(a.decode("utf-8"))
    
    print(decoded_string)
    

    Prints:

    Сравнение