Search code examples
pythonunicodecharacter-encodingcherrypyjinja2

CherryPy doesn't properly handle non-ASCII characters in Jinja2 templates


I am trying to run a website using Python 2.7.1, Jinja 2.5.2, and CherryPy 3.1.2. The Jinja templates I am using are UTF-8 encoded. I noticed that some of the characters in those templates are being turned into question marks and other gibberish. If I try to render the templates directly without Jinja, I don't notice this problem. I discovered that I can fix it by calling .encode("utf-8") on the output of all my handlers, but that gets annoying since it clutters up my source. Does anyone know why this would happen or what to do about it? I made a small script to demonstrate this problem. The "char.txt" file is a 2-byte file consisting solely of a UTF-8 encoded "»" character.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os, jinja2, cherrypy
jinja2env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))

class Test(object):
    def test1(self):
        #doesn't work
        #curl "http://example.com/test1"
        #?
        return jinja2env.get_template("char.txt").render()
    test1.exposed = True

    def test2(self):
        #works
        #curl "http://example.com/test2"
        #»
        return open("char.txt").read()
    test2.exposed = True

    def test3(self):
        #works, but it is annoying to have to call this extra function all the time
        #curl "http://example.com/test3"
        #»
        return jinja2env.get_template("char.txt").render().encode("utf-8")
    test3.exposed = True

cherrypy.config["server.socket_port"] = 8500
cherrypy.quickstart(Test())

Solution

  • From the CherryPy tutorial:

    tools.encode: automatically converts the response from the native Python Unicode string format to some suitable encoding (Latin-1 or UTF-8, for example).

    That sounds like your answer.