Search code examples
pythonunicodeflaskjinja2

Can't display Unicode chars with Flask


I have some strings in my database with Unicode chars that I can't display properly on my website. However it works correctly in one situation which is interesting.

So it works when I do this:

@app.route('/')
def main():
    return render_template('home.html', text = '\u00e9ps\u00e9g')
# displays: épség

But it does not work when I do this (query the database and pass the string from result):

@app.route('/')
def main():
    text_string = getText()
    return render_template('home.html', text = text_string )
# displays: \u00e9ps\u00e9g

However when I use exactly the same string that I get from the second version with the first solution it works perfectly.

I am interested to discover why the first solution works and the second does not. Both string should be the same, but when I get it from the server it stays the same when I display it. When I add it manually it's good again. However unfortunately I have hundreds of strings so I need to use the second one.


Solution

  • What you have in one case is unicode-escape sequences that represent a single Unicode character. In the other case you have literal characters \,u,... that represent six characters. this can be illustrated using raw strings, which ignore Unicode escape sequences:

    >>> text = '\u00e9ps\u00e9g'
    >>> print(text)
    épség
    >>> text = r'\u00e9ps\u00e9g'
    >>> print(text)
    \u00e9ps\u00e9g
    

    To convert a Unicode string with literal escape sequences, first you need a byte string, then decode with the unicode_escape codec. To obtain a byte string from a Unicode string with literal escape codes for non-ASCII characters, encode it with the ascii codec:

    >>> text = r'\u00e9ps\u00e9g'
    >>> print(text)
    \u00e9ps\u00e9g
    >>> print(text.encode('ascii').decode('unicode_escape'))
    épség
    

    From your comment you may have text from a JSON data file. If it is proper JSON, this should decode it:

    >>> s = r'"\u00e9ps\u00e9g \ud83c\udf0f"'
    >>> print(s)
    "\u00e9ps\u00e9g \ud83c\udf0f"
    >>> print(json.loads(s))
    épség 🌏
    

    Note that a JSON string is quoted. It would not decode without the double-quotes.