I'm using tornado to accept some data sended from clients I don't have access to. Everything works fine if only English characters appear in the data. When utf-8 encoded Chinese characters(3 bytes) are within the data, Tornado gives me this warning and the 'get_argument' function can't get anything at all.
I debuged and simplified my code to the simplest, yet the warning still comes up
class DataHandler(tornado.web.RequestHandler):
def post(self):
print("test")
print(self.get_argument("data"))
print("1")
application = tornado.web.Application([
(r"/data", Data),
])
application.listen(5000)
tornado.ioloop.IOLoop.instance().start()
The data's format looks like this:
data={"id":"00f1c423","mac":"11:22:33:44:55:66"}
The data is x-www-form-urlencoded and WireShark shows the Chinese characters are perfectly 3-bytes utf-8 which starts with E(1110). The position mentioned in the warning(774-777) is where the Chinese characters begins and it's always 5 bytes, despite the changing of Chinese characters.
I'm confused about the 'encode' in the warning. I actually did nothing about encoding in my code, so I presume it's what Tornado does within the RequestHandler class. But since Tornado defaults to use utf-8 codec, where does this latin-1 come from? And most importantly, how can I fix it?
This won't be a problem anymore. Tornado did some changes to support x-www-form-urlencoded body with values consisting of encoded bytes which are not url-encoded into ascii.
Also: github issue #2733