Let test
be the module we run as __main__
. This module contains one global variable named primes
, which is initialized in the module with the following assignment.
primes = []
The module also contains a function named pi
, which alters this global variable:
def pi(n):
global primes
"""Some code that modifies the global 'primes' variable"""
I then want to time said function using the builtin timeit
module. I want to use the timeit.repeat
function and get the minimum value of the timing, as a way of improving the measurement's accuracy (instead of measuring just one time, which may be subject to slow-down due to unrelated processes).
print(min(timeit.repeat('test.pi(50000)',
setup="import test",
number=1, repeat=10)) * 1000)
The problem is that the pi
function behaves differently depending on the value of primes
: I expected that, for each repetition, the import test
statement in the setup
parameter would re-run the primes = []
statement in the test
, thus 'resetting' primes
so that the code being executed would be identical for each repetition. But, instead, the value of primes
that resulted from the previous execution is used, so I had to add the statement test.primes = []
to the setup
parameter:
print(min(timeit.repeat('test.pi(50000)',
setup="import test \n" + "test.primes = []",
number=1, repeat=10)) * 1000)
This leads me to the question: is there a direct way (i.e. in one statement) to 'reset' the values of all the global variables to what they were when they were first assigned in the module?
In this specific scenario adding that one statement to manually 'reset' primes
works fine, but consider a case in which there are a lot of global variables, and you want to 'reset' all of them.
Why doesn't the statement import test
re-run the initial primes = []
assignment?
Let's start with your side question, because it turns out that it's actually central to everything:
Why doesn't the statement
import test
re-run the initialprimes = []
assignment?"
Because, as explained in the docs on the import system and the import
statement, what import test
does is, loosely, this pseudocode:
if 'test' not in sys.modules:
find, load (compiling if needed), and exec the module
sys.modules['test'] = result
test = sys['test.modules']
OK, but why does it do that?
If you have two modules that both import the same module, they expect to see the same globals. And remember that types, functions, etc. defined at the top level of a function are all globals. For example, if sortedlist.py
imports collections.abc
to class SortedList(collections.abc.Sequence):
, and scraper.py
imports collections.abc
to isinstance(something, collections.abc.Sequence)
, you'd want a SortedList
to pass that test—but it won't if those are two completely independent types because they came from two different module objects that happen to have the same name,
If you have 12 modules that all import pandas as pd
, you'd be running all the Pandas initialization code 12 times. Except that some of your modules also probably import each other, so they'd each be run multiple times, and import Pandas each time. How long do you think it would take to run all the Pandas initialization 60 times?
So, reusing existing modules is almost always what you want.
And when you don't, that's usually a sign that there's something wrong with your design (which may well be the case here).
But "almost always" isn't "always". So there are ways around it. None of them are usually a good idea for live code, but for things like unit tests and benchmarking, there are three basic options that are all fine, as long as the tradeoffs are the ones you want:
del sys.modules['test']
. This is obviously pretty hacky, but it actually does exactly what you want here. Any existing references to the old module are completely untouched, but the next time anyone does import test
, they're going to get a brand-new test
module.importlib.reload(test)
. This sounds great, but it may on the one hand be overkill (notice that it forces the module source to be recompiled, which you don't need), while on the other it may not be sufficient (it doesn't actually reset the globals—if your code does primes = []
at the top level, that line gets executed, so who cares, but if your code instead does, say, globals().setdefault('primes', [])
inside the pi
function, you care).import test
, manually do all the steps up through executing the module (see the examples in the importlib
docs), but don't store it in sys.modules['test']
or in test
, just store it in a local variable you discard after each test. This is probably the cleanest, although it does mean 6 lines of code instead of 1.