python architecture internationalization gettext

Approaches to changing language at runtime with python gettext

I have read lots of posts about using Python gettext, but none of them addressed the issue of changing languages at runtime.

Using gettext, strings are translated by the function _() which is added globally to builtins. The definition of _ is language-specific and will change during execution when the language setting changes. At certain points in the code, I need strings in an object to be translated to a certain language. This happens by:

(Re)define the _ function in builtins to translate to the chosen language
(Re)evaluate the desired object using the new _ function - guaranteeing that any calls to _ within the object definition are evaluated using the current definition of _.
Return the object

I am wondering about different approaches to step 2. I thought of several but they all seem to have fundamental flaws.

What is the best way to achieve step 2 in practice?
Is it theoretically possible to achieve step 2 for any arbitrary object, without knowledge of its implementation?

Possible approaches

If all translated text is defined in functions that can be called in step 2, then it's straightforward: calling the function will evaluate using the current definition of _. But there are lots of situations where that's not the case, for instance, translated strings could be module-level variables evaluated at import time, or attributes evaluated when instantiating an object.

Minimal example of this problem with module-level variables is here.

Re-evaluation

Manually reload modules

Module-level variables can be re-evaluated at the desired time using importlib.reload. This gets more complicated if the module imports another module that also has translated strings. You have to reload every module that's a (nested) dependency.

With knowledge of the module's implementation, you can manually reload the dependencies in the right order: if A imports B,

importlib.reload(B)
importlib.reload(A)
# use A...

Problems: Requires knowledge of the module's implementation. Only reloads module-level variables.

Automatically reload modules

Without knowledge of the module's implementation, you'd need to automate reloading dependencies in the right order. You could do this for every module in the package, or just the (recursive) dependencies. To handle more complex situations, you'd need to generate a dependency graph and reload modules in breadth-first order from the roots.

Problems: Requires complex reloading algorithm. There are likely edge cases where it's not possible (cyclic dependencies, unusual package structure, from X import Y-style imports). Only reloads module-level variables.

Re-evaluate only the desired object?

eval allows you to evaluate dynamically generated expressions. Instead could you re-evaluate an existing object's static expression, given a dynamic context (builtins._)? I guess this would involve recursively re-evaluating the object, and every object referenced in its definition, and every object referenced in their definitions... I looked through the inspect module and didn't find any obvious solution.

Problems: Not sure if this is possible. Security issues with eval and similar.

Delayed evaluation

Lazy evaluation

The Flask-Babel project provides a LazyString that delays evaluation of a translated string. If it could be completely delayed until step 2, that seems like the cleanest solution.

Problems: A LazyString can still get evaluated before it's supposed to. Lots of things may call its __str__ function and trigger evaluation, such as string formatting and concatenating.

Deferred translation

The python gettext docs demonstrate temporarily re-defining the _ function, and only calling the actual translation function when the translated string is needed.

Problems: Requires knowledge of the object's structure, and code customized to each object, to find the strings to translate. Doesn't allow concatenation or formatting of translated strings.

Refactoring

All translated strings could be factored out into a separate module, or moved to functions such that they can be completely evaluated at a given time.

Problems: As I understand it the point of gettext and the global _ function is to minimize the impact of translation on existing code. Refactoring like this could require significant design changes and make the code more confusing.

Solution

The only plausible, general approach is to rewrite all relevant code to not only use _ to request translation but to never cache the result. That’s not a fun idea and it’s not a new idea—you already list Refactoring and Deferred translation that rely on the cooperation of the gettext clients—but it is the “best way […] in practice”.

You can try to do a super-reload by removing many things from sys.modules and then doing a real reimport. This approach avoids understanding the import relationships, but works only if the relevant modules are all written in Python and you can guarantee that the state of your program will retain no references to any objects (including types and modules) that used the old language. (I’ve done this, but only in a context where the overarching program was a sort of supervisor utterly uninterested in the features of the discarded modules.)

You can try to walk the whole object graph and replace the strings, but even aside from the intrinsic technical difficulty of such an algorithm (consider __slots__ in base classes and co_consts for just the mildest taste), it would involve untranslating them, which changes from hard to impossible when some sort of transformation has already been performed. That transformation might just be concatenating the translated strings, or it might be pre-substituting known values to format, or padding the string, or storing a hash of it: it’s certainly undecidable in general. (I’ve done this too for other data types, but only with data constructed by a file reader whose output used known, simple structures.)

Any approach based on partial reevaluation combines the problems of the methods above.

The only other possible approach is a super-LazyString that refuses to translate for longer by implementing operations like + to return objects that encode the transformations to eventually apply, but it’s impossible to know when to force those operations unless you control all mechanisms used to display or transmit strings. It’s also impossible to defer past, say, if len(_("…"))>80:.