A user inputs a string on my website. They input a non-ascii character.
The javascript saves their input, packages it with JSON.stringify(), and sends it to the server.
The server, running Python3, unpacks the JSON with json.loads and saves this string in a Node object, then runs the line
print('looks like {}'.format(node_obj))
I receive the error
'ascii' codec can't encode character error '\u2212' in position 941: ordinal not in range(128)
It seems to me that the print function in Python3 is trying to convert the unicode string to ascii! (convert to bytes object using ascii encoding?)
Is it possible that my FreeBSD server does not support UTF-8, causing Python's print function to make this conversion? Or perhaps the string was never properly sanitized in the first place, and I should be doing that in the javascript when I first receive it from the user?
Let me know what further information is useful to you.
What does the locale
command say?
You can make Python use utf-8 with either LANG=en_US.UTF-8
or PYTHONIOENCODING=utf-8
.
Setting LANG in the default environment is platform-dependent: https://unix.stackexchange.com/questions/342817/how-do-i-add-a-language-in-freebsd