Search code examples
pythonencodingpython-babelflask-babel

Flask-Babel fail to extract UTF-8 content


I have a translatable string in one of my Jinja2 templates:

Project can’t end sooner than it starts

(Note the UTF-8 apostrophe in “can’t”.)

When I extract messages and update my translation files, both the template (.pot) and translation (.po) files have the following msgid:

msgid "Project canât end sooner than it starts"

It seems Babel “translated” the UTF-8 characters as if they were in some kind of 8-bit character set.

My babel.cfg is a really short one:

[python: **.py]
[jinja2: **/templates/**.html]
extensions=jinja2.ext.autoescape,jinja2.ext.with_,webassets.ext.jinja2.AssetsExtension

Is there a way for Babel to notice the template is already in UTF-8, and not to transalete it from whatever charset it thinks? I can’t see any related option in the help output of pybabel extract --help nor pybabel extract --help.

I use Python3 exclusively, for the record.


Solution

  • Turns out it is supported out of the box, it’s just seems undocumented. All I had to do is changing the configuration:

    [python: **.py]
    [jinja2: **/templates/**.html]
    encoding=utf-8
    extensions=jinja2.ext.autoescape,jinja2.ext.with_,webassets.ext.jinja2.AssetsExtension
    

    The encoding=utf-8 part did its magic, all HTML files are now treated as UTF-8 data.