Search code examples
pythontimingtimeit

Reset global variables in timeit.repeat


Scenario

Let test be the module we run as __main__. This module contains one global variable named primes, which is initialized in the module with the following assignment.

primes = []

The module also contains a function named pi, which alters this global variable:

def pi(n):
    global primes
    """Some code that modifies the global 'primes' variable"""

I then want to time said function using the builtin timeit module. I want to use the timeit.repeat function and get the minimum value of the timing, as a way of improving the measurement's accuracy (instead of measuring just one time, which may be subject to slow-down due to unrelated processes).

print(min(timeit.repeat('test.pi(50000)',
                        setup="import test",
                        number=1, repeat=10)) * 1000)

The problem is that the pi function behaves differently depending on the value of primes: I expected that, for each repetition, the import test statement in the setup parameter would re-run the primes = [] statement in the test, thus 'resetting' primes so that the code being executed would be identical for each repetition. But, instead, the value of primes that resulted from the previous execution is used, so I had to add the statement test.primes = [] to the setup parameter:

print(min(timeit.repeat('test.pi(50000)',
                        setup="import test \n" + "test.primes = []",
                        number=1, repeat=10)) * 1000)

Question

This leads me to the question: is there a direct way (i.e. in one statement) to 'reset' the values of all the global variables to what they were when they were first assigned in the module?

In this specific scenario adding that one statement to manually 'reset' primes works fine, but consider a case in which there are a lot of global variables, and you want to 'reset' all of them.


Side quest-ion

Why doesn't the statement import test re-run the initial primes = [] assignment?


Solution

  • Let's start with your side question, because it turns out that it's actually central to everything:

    Why doesn't the statement import test re-run the initial primes = [] assignment?"

    Because, as explained in the docs on the import system and the import statement, what import test does is, loosely, this pseudocode:

    if 'test' not in sys.modules:
        find, load (compiling if needed), and exec the module
        sys.modules['test'] = result
    test = sys['test.modules']
    

    OK, but why does it do that?

    • If you have two modules that both import the same module, they expect to see the same globals. And remember that types, functions, etc. defined at the top level of a function are all globals. For example, if sortedlist.py imports collections.abc to class SortedList(collections.abc.Sequence):, and scraper.py imports collections.abc to isinstance(something, collections.abc.Sequence), you'd want a SortedList to pass that test—but it won't if those are two completely independent types because they came from two different module objects that happen to have the same name,

    • If you have 12 modules that all import pandas as pd, you'd be running all the Pandas initialization code 12 times. Except that some of your modules also probably import each other, so they'd each be run multiple times, and import Pandas each time. How long do you think it would take to run all the Pandas initialization 60 times?


    So, reusing existing modules is almost always what you want.

    And when you don't, that's usually a sign that there's something wrong with your design (which may well be the case here).

    But "almost always" isn't "always". So there are ways around it. None of them are usually a good idea for live code, but for things like unit tests and benchmarking, there are three basic options that are all fine, as long as the tradeoffs are the ones you want:

    • del sys.modules['test']. This is obviously pretty hacky, but it actually does exactly what you want here. Any existing references to the old module are completely untouched, but the next time anyone does import test, they're going to get a brand-new test module.
    • importlib.reload(test). This sounds great, but it may on the one hand be overkill (notice that it forces the module source to be recompiled, which you don't need), while on the other it may not be sufficient (it doesn't actually reset the globals—if your code does primes = [] at the top level, that line gets executed, so who cares, but if your code instead does, say, globals().setdefault('primes', []) inside the pi function, you care).
    • Instead of import test, manually do all the steps up through executing the module (see the examples in the importlib docs), but don't store it in sys.modules['test'] or in test, just store it in a local variable you discard after each test. This is probably the cleanest, although it does mean 6 lines of code instead of 1.