Search code examples
pythonpython-3.xunicodepython-unicode

make unicode a string stored in a variable and then send it with telepot


Introduction

I'm creating a scraper bot with telepot and selenium and when i get the text data that i need to send with the telegram bot it is unreadabl, because it contains unicode-escape characters (emoji) in a wrong format like:

"hi I like this emoji: \\u265B\\u2655"

Output

"hi I like this emoji: \u265B\u2655"

Needed Output

"hi I like this emoji: ♕♛"

in my case i can't use u"hi I like this emoji: \u265B\u2655" because my string is stored in a variable obtained with selenium and regex

What i have tried

I used json.loads("hi I like this emoji: \\u265B\\u2655") i got this

Exception Raised

raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Question

How can i format this string to obtain the needed output?

Edit

i tried yhis:

json.loads('"' + mystring + '"')

and i got:

json.decoder.JSONDecodeError: Invalid control character at: line 1 column 23 (char 22)

as asked in the comment this is the result of print(repr(mystring)):

'La Spezia\\ud83d\\udccd\\n\\ud83d\\udcdaLiceo Scientifico Sportivo A. Pacinotti\\ud83c\\udfeb\\nITALIAN FENCER \\ud83c\\uddee\\ud83c\\uddf9 \\ud83e\\udd3a SPCS!!\\nELECTRIC BASS\\ud83c\\udfb8\\ud83c\\udfb6\\nBooks \\ud83d\\udcd6\\n2a T ( ESCONI ) \\ud83d\\ude0d \\ud83c\\udf93'

Solution

  • From your final edit, the scraped string looks like a JSON-encoded string that was extracted directly out of a JSON file somewhere. Strings in JSON need to be double-quoted to extract properly:

    >>> import json
    >>> s='La Spezia\\ud83d\\udccd\\n\\ud83d\\udcdaLiceo Scientifico Sportivo A. Pacinotti\\ud83c\\udfeb\\nITALIAN FENCER \\ud83c\\uddee\\ud83c\\uddf9 \\ud83e\\udd3a SPCS!!\\nELECTRIC BASS\\ud83c\\udfb8\\ud83c\\udfb6\\nBooks \\ud83d\\udcd6\\n2a T ( ESCONI ) \\ud83d\\ude0d \\ud83c\\udf93'
    >>> print(json.loads(f'"{s}"'))
    La Spezia📍
    📚Liceo Scientifico Sportivo A. Pacinotti🏫
    ITALIAN FENCER 🇮🇹 🤺 SPCS!!
    ELECTRIC BASS🎸🎶
    Books 📖
    2a T ( ESCONI ) 😍 🎓