Introduction
I'm creating a scraper bot with telepot and selenium and when i get the text data that i need to send with the telegram bot it is unreadabl, because it contains unicode-escape characters (emoji) in a wrong format like:
"hi I like this emoji: \\u265B\\u2655"
Output
"hi I like this emoji: \u265B\u2655"
Needed Output
"hi I like this emoji: ♕♛"
in my case i can't use u"hi I like this emoji: \u265B\u2655"
because my string is stored in a variable obtained with selenium and regex
What i have tried
I used json.loads("hi I like this emoji: \\u265B\\u2655")
i got this
Exception Raised
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Question
How can i format this string to obtain the needed output?
Edit
i tried yhis:
json.loads('"' + mystring + '"')
and i got:
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 23 (char 22)
as asked in the comment this is the result of print(repr(mystring))
:
'La Spezia\\ud83d\\udccd\\n\\ud83d\\udcdaLiceo Scientifico Sportivo A. Pacinotti\\ud83c\\udfeb\\nITALIAN FENCER \\ud83c\\uddee\\ud83c\\uddf9 \\ud83e\\udd3a SPCS!!\\nELECTRIC BASS\\ud83c\\udfb8\\ud83c\\udfb6\\nBooks \\ud83d\\udcd6\\n2a T ( ESCONI ) \\ud83d\\ude0d \\ud83c\\udf93'
From your final edit, the scraped string looks like a JSON-encoded string that was extracted directly out of a JSON file somewhere. Strings in JSON need to be double-quoted to extract properly:
>>> import json
>>> s='La Spezia\\ud83d\\udccd\\n\\ud83d\\udcdaLiceo Scientifico Sportivo A. Pacinotti\\ud83c\\udfeb\\nITALIAN FENCER \\ud83c\\uddee\\ud83c\\uddf9 \\ud83e\\udd3a SPCS!!\\nELECTRIC BASS\\ud83c\\udfb8\\ud83c\\udfb6\\nBooks \\ud83d\\udcd6\\n2a T ( ESCONI ) \\ud83d\\ude0d \\ud83c\\udf93'
>>> print(json.loads(f'"{s}"'))
La Spezia📍
📚Liceo Scientifico Sportivo A. Pacinotti🏫
ITALIAN FENCER 🇮🇹 🤺 SPCS!!
ELECTRIC BASS🎸🎶
Books 📖
2a T ( ESCONI ) 😍 🎓