I receive json data from an API:
json = {"lat": null, "body_text": "@edinburgh \u2764\ufe0f", "deduplicated_time": "2020-11-05T15:38:11.744710"}
I use Python to load the json message.
msg_body = json.loads(msg.body,strict=False)
I use VaderSentiment to extract the sentiment from the text on the body_text section of the json message.
Problem is that when red heart ❤ emoji is included as \u2764\ufe0f on the text Vader fails to predict the correct emotion. On their page they suggest that vader is translating utf-8 encoded emojis such as 💘 and 💋 and 😁. I believe that \u2764\ufe0f is not UTF-8 , how can I turn it UTF-8 using python?
If the following page emoji is correct the \u2764\ufe0f is "python src" encoding.
It’s a JSON encoded Unicode character. Decode the JSON, e.g. with json.loads
, and you’ll get a Python string with a red heart. If you need to encode that to UTF-8 encoded bytes
, use str.encode
(though likely the library you want to use it with will want normal Python str
s).