python python-3.x python-import python-importlib

Reload function fails to erase removed variables

I am trying to access variables from a module that is being modified while the main script runs by using reload. But the reload function fails to erase the variables that have been removed from the module. How to force python to erase them ?

Here is my code

My module.py :

a = 1
b = 2

My main scipt.py :

import time
from importlib import reload

import module

while True:
    reload(module)
    print('------- module reloaded')
    try:
        print('a: ', module.a)
    except AttributeError:
        print('a: ', 'undefined')
    try:
        print('b: ', module.b)
    except AttributeError:
        print('b: ', 'undefined')
    try:
        print('c: ', module.c)
    except AttributeError:
        print('c: ', 'undefined')

    time.sleep(5)

As expected, if I run my script with python (3.5.1) I get the output:

------- module reloaded
a:  1
b:  2
c:  undefined

But I get an unexpected behavior, when I change the module.py as following:

# a = 1
b = 3
c = 4

I have the following output:

------- module reloaded
a:  1
b:  3
c:  4

This means that reload correctly updated the value of b and added the new variable c. But it failed to erase the variable a that has been removed from the module. It seems to only perform updates on the variables that are found in the new version of the module. How to force the reload function to erase removed values ?

Thank you for your help

Solution

The issue here is that reload is implemented with code that, roughly, execs the current version of the module code in the existing cached module's namespace. The reason for doing this is that reload is intended to have a global effect; reload(module) doesn't just reload it for you, it changes every other module's personally imported copy of module. If it created a new module namespace, other modules would still have the old module cached; while erasing the contents of the old module before execing might help, it would risk breaking already imported submodules of a package, trigger race conditions (where a thread might see module.a before and after the reload, but it would mysteriously disappear for a moment during the reload), etc.

As the docs note:

When a module is reloaded, its dictionary (containing the module’s global variables) is retained. Redefinitions of names will override the old definitions, so this is generally not a problem. If the new version of a module does not define a name that was defined by the old version, the old definition remains.

There are workarounds that bypass this safety mechanism if you absolutely must do it. The simplest is to simply remove the module from the module cache and reimport it, rather than reloading it:

import sys  # At top of file

del sys.modules['module']
import module

That won't update any other importers of the module (they'll keep the stale cache), but if the module is only used in your module, that'll work.

Another approach that might work (untested, and it's kind of insane) would be to explicitly delete all the public names from the module before reloading with something like:

# Intentionally done as list comprehension so no modification to module's globals dict
# occurs while we're iterating it
# Might make sense to use dir or iterate module.__all__ if available instead of using vars;
# depends on design
for name in [n for n in vars(module) if not n.startswith('_')]:
    try:
        delattr(module, name)
    except Exception:
        pass  # Undeletable attribute for whatever reason, just ignore and let reload deal with it
reload(module)  # Now exec occurs in clean namespace

That would avoid the stale cache issue, in exchange for insanity.

Really, the answer is "don't use a design that depends on production reloading"; if the module is just data, store it as a JSON file or the like and just reparse it (which is generally much cheaper than what the Python import machinery goes through to import a module).