Search code examples
pythonjsonstringunicodecyrillic

Python: Cyrillic handling


I got this data returned b'\\u041a\\u0435\\u0439\\u0442\\u043b\\u0438\\u043d\\u043f\\u0440\\u043e from an API. This data is in Russian which I know for sure. I am guessing these values are unicode representation of the cyrillic letters?

The data returned was a byte array.

How can I convert that into readable cyrillic string? Pretty much I need a way to convert that kind into readable human text.

EDIT: Yes this is JSON data. Forgot to mention, sorry.


Solution

  • Chances are you have JSON data; JSON uses \uhhhh escape sequences to represent Unicode codepoints. Use the json.loads() function on unicode (decoded) data to produce a Python string:

    import json
    
    string = json.loads(data.decode('utf8'))
    

    UTF-8 is the default JSON encoding; check your response headers (if you are using a HTTP-based API) to see if a different encoding was used.

    Demo:

    >>> import json
    >>> json.loads(b'"\\u041a\\u0435\\u0439\\u0442\\u043b\\u0438\\u043d\\u043f\\u0440\\u043e"'.decode('utf8'))
    'Кейтлинпро'