I have read lots of posts about using Python gettext
, but none of them addressed the issue of changing languages at runtime.
Using gettext
, strings are translated by the function _()
which is added globally to builtins
. The definition of _
is language-specific and will change during execution when the language setting changes. At certain points in the code, I need strings in an object to be translated to a certain language. This happens by:
_
function in builtins
to translate to the chosen language_
function - guaranteeing that any calls to _
within the object definition are evaluated using the current definition of _
.I am wondering about different approaches to step 2. I thought of several but they all seem to have fundamental flaws.
If all translated text is defined in functions that can be called in step 2, then it's straightforward: calling the function will evaluate using the current definition of _
. But there are lots of situations where that's not the case, for instance, translated strings could be module-level variables evaluated at import time, or attributes evaluated when instantiating an object.
Minimal example of this problem with module-level variables is here.
Module-level variables can be re-evaluated at the desired time using importlib.reload
. This gets more complicated if the module imports another module that also has translated strings. You have to reload every module that's a (nested) dependency.
With knowledge of the module's implementation, you can manually reload the dependencies in the right order: if A imports B,
importlib.reload(B)
importlib.reload(A)
# use A...
Problems: Requires knowledge of the module's implementation. Only reloads module-level variables.
Without knowledge of the module's implementation, you'd need to automate reloading dependencies in the right order. You could do this for every module in the package, or just the (recursive) dependencies. To handle more complex situations, you'd need to generate a dependency graph and reload modules in breadth-first order from the roots.
Problems: Requires complex reloading algorithm. There are likely edge cases where it's not possible (cyclic dependencies, unusual package structure, from X import Y
-style imports). Only reloads module-level variables.
eval
allows you to evaluate dynamically generated expressions. Instead could you re-evaluate an existing object's static expression, given a dynamic context (builtins._
)? I guess this would involve recursively re-evaluating the object, and every object referenced in its definition, and every object referenced in their definitions...
I looked through the inspect
module and didn't find any obvious solution.
Problems: Not sure if this is possible. Security issues with eval
and similar.
The Flask-Babel project provides a LazyString that delays evaluation of a translated string. If it could be completely delayed until step 2, that seems like the cleanest solution.
Problems: A LazyString
can still get evaluated before it's supposed to. Lots of things may call its __str__
function and trigger evaluation, such as string formatting and concatenating.
The python gettext docs demonstrate temporarily re-defining the _
function, and only calling the actual translation function when the translated string is needed.
Problems: Requires knowledge of the object's structure, and code customized to each object, to find the strings to translate. Doesn't allow concatenation or formatting of translated strings.
All translated strings could be factored out into a separate module, or moved to functions such that they can be completely evaluated at a given time.
Problems: As I understand it the point of gettext
and the global _
function is to minimize the impact of translation on existing code. Refactoring like this could require significant design changes and make the code more confusing.
The only plausible, general approach is to rewrite all relevant code to not only use _
to request translation but to never cache the result. That’s not a fun idea and it’s not a new idea—you already list Refactoring and Deferred translation that rely on the cooperation of the gettext
clients—but it is the “best way […] in practice”.
You can try to do a super-reload
by removing many things from sys.modules
and then doing a real reimport. This approach avoids understanding the import relationships, but works only if the relevant modules are all written in Python and you can guarantee that the state of your program will retain no references to any objects (including types and modules) that used the old language. (I’ve done this, but only in a context where the overarching program was a sort of supervisor utterly uninterested in the features of the discarded modules.)
You can try to walk the whole object graph and replace the strings, but even aside from the intrinsic technical difficulty of such an algorithm (consider __slots__
in base classes and co_consts
for just the mildest taste), it would involve untranslating them, which changes from hard to impossible when some sort of transformation has already been performed. That transformation might just be concatenating the translated strings, or it might be pre-substituting known values to format, or padding the string, or storing a hash of it: it’s certainly undecidable in general. (I’ve done this too for other data types, but only with data constructed by a file reader whose output used known, simple structures.)
Any approach based on partial reevaluation combines the problems of the methods above.
The only other possible approach is a super-LazyString
that refuses to translate for longer by implementing operations like +
to return objects that encode the transformations to eventually apply, but it’s impossible to know when to force those operations unless you control all mechanisms used to display or transmit strings. It’s also impossible to defer past, say, if len(_("…"))>80:
.