I have a byte string like this, it should be Сравнение
in Cyrillic characters:
a = b'Сравнение'
Decoding it into UTF-8 doesn't help:
a = b'Сравнение'
a.decode("utf-8") # prints same Ср... string
Which encoding is this and how to decode the string?
I'm using Google Colab with Python 3.10.12.
This online decoder after applying auto-decode says it must be decoded from UTF-8 to UTF-8.
You can use html.unescape
:
import html
a = b"Сравнение"
decoded_string = html.unescape(a.decode("utf-8"))
print(decoded_string)
Prints:
Сравнение