Search code examples
pythonunicodeunicode-string

Python: convert strings containing unicode code point back into normal characters


I'm working with the requests module to scrape text from a website and store it into a txt file using a method like below:

r = requests.get(url)
with open("file.txt","w") as filename:
        filename.write(r.text)

With this method, say if "送分200000" was the only string that requests got from url, it would've been decoded and stored in file.txt like below.

\u9001\u5206200000

When I grab the string from file.txt later on, the string doesn't convert back to "送分200000" and instead remains at "\u9001\u5206200000" when I try to print it out. For example:


with open("file.txt", "r") as filename:
        mystring = filename.readline()
        print(mystring)

Output:
"\u9001\u5206200000"

Is there a way for me to convert this string and others like it back to their original strings with unicode characters?


Solution

  • convert this string and others like it back to their original strings with unicode characters?

    Yes, let file.txt content be

    \u9001\u5206200000
    

    then

    with open("file.txt","rb") as f:
        content = f.read()
    text = content.decode("unicode_escape")
    print(text)
    

    output

    送分200000
    

    If you want to know more read Text Encodings in codecs built-in module docs