Search code examples
pythonpython-importsys

Relative importing and module reloading


I have a machine learning project and I have to conduct m.l. experiments. Each experiment is made with a module that is strongly dependent on other files in the repository. For the sake of reproducibility, I consider each experiment as a snapshot of the repository. So later I want to reproduce the experiment and compare it to another experiment in ONE script. To do this I restore the state of the repository at the moment of the experiment in special folder reproduce/experiment_name and organize the following structure:

├── launcher.py
├── package
│   ├── dependency.py
│   └── subpackage
│       └── module.py
└── reproduce
    ├── experiment_23
    │   └── package
    │       ├── dependency.py
    │       └── subpackage
    │           ├── module.py
    └── experiment_24
        └── package
            ├── dependency.py
            └── subpackage
                └── module.py

So in launcher.py I want to import the content of experiment_23 given the path to the needed module with importlib.utils. But I need to keep it's dependencies so that old code works in its own realm. I come up with an idea to modify the sys.path and add path right before the import and then remove it. Here is the content of launcher.py.

import sys
import importlib.util


path_to_module = "/package/subpackage/module.py"
experiment_path = "/reproduce/exp23"
project_path = "/home/user/PycharmProjects/PathTest"

# Here I add path to experiment directory
sys.path.insert(0, project_path + experiment_path)

spec = importlib.util.spec_from_file_location("", project_path + experiment_path + path_to_module, )
exp23 = importlib.util.module_from_spec(spec)
spec.loader.exec_module(exp23)

# Clear path
sys.path = sys.path[1:]
# This runs successfully
exp23.run()


# So now i want to import my newest version of the module.py
# And this fails, meaning it executes code from exp23
import package.subpackage.module as exp
exp.run()

Content of module.py

from package.dependency import function


def run():
    function()

and dependency.py is a success marker:

def function():
    print("23")

So the problem is that once the import was made and dependency loaded from package.dependency import function, python will no longer look for it in sys.path and use some kind of cache. My question is how can I avoid usage of cache in this case or at least how can I achieve similar functionality with different methods?


Solution

  • So before anything else, I'll assume you have an __init__.py inside every package and subpackage directory to make them actual "packages"; as specified here.

    Now, you should notice how subpackage containing module.py is an actual sub-package of package. So when you do from package.dependency import ..., you are trying to import from the parent package. Not only is this a bad practice, but the path that you use in launcher.py only-and-directly targets module.py, which has no idea what package is, or even subpackage for that matter.

    What you should be targeting is package, which (again, assuming you have an __init__.py in there) will load as an actual "package". Making everything inside it properly accessible... after a few adjustments.

    In your __init__.py's you should expose everything you want this module to contain. For example, in your package/__init__.py you should have the line from . import subpackage. This will not only allow you to do stuff like package.subpackage, and it's also vital in loading the contents of subpackage.

    Following the same logic, in your subpackage/__init__.py there should be from . import module, loading and exposing module.py, allowing you to do package.subpackage.module.

    This should be all you'd need to properly load and access all your desired contents... except you also wrap all of that in experiment_n folders. Here you have two options: You can make experiment_n a package with an __init__.py, expose its package and target that with importlib, or you could rename package to experiment_n and delete the top folder. Going from this:

    experiment_23
        └── package
            ├── __init__.py
            ├── dependency.py
            └── subpackage
                ├── __init__.py
                └── module.py
    

    To this:

    experiment_23
        ├── __init__.py
        ├── dependency.py
        └── subpackage
            ├── __init__.py
            └── module.py
    

    Let's take a break from folder editing and check out how you'd use our current iteration in code. Consider the following launcher.py:

    import sys, os
    import importlib.util
    
    current_directory = os.path.dirname(os.path.realpath(__file__))
    
    path_to_experiment = os.path.join(current_directory, 'reproduce/experiment_23')
    
    path_to_experiment = os.path.join(path_to_experiment, '__init__.py') # package!
    
    spec = importlib.util.spec_from_file_location('exp23', path_to_experiment, submodule_search_locations = [])
    
    exp23 = importlib.util.module_from_spec(spec)
    
    sys.modules[exp23.__name__] = exp23
    
    # old one
    spec.loader.exec_module(exp23)
    
    exp23.subpackage.module.run()
    
    # new one
    import package.subpackage.module as exp
    
    exp.run()
    

    With the main function printing main and the reproduce one printing exp23.

    First, we compose our path_to_experiment, pointing to our package's __init__.py. module_from_file_location needs a file, not a folder, hence why we don't target /expriment_23 directly. After, with an actual name (exp23), we get our spec with spec_from_file_location, setting submodule_search_locations to an empty list to indicate it's a package as specified here. Last, fetch our module with module_from_spec.

    It's important that we add our new module to sys.modules so relative imports (which we use) can find their parent module.

    We let our module execute and call its and our new module's run function:

    main
    main
    

    Seems like it didn't work, we wanted the first one to be exp23

    The culprit is this from package.dependency import function line in our experiment_23/subpackage/module.py. The import here is absolute, not relative, meaning it will look for / or import "package" in / to sys.modules and use that. In this case, this points directly to our main package. Changing it to from ..dependency import function and re-running launcher.py gives us:

    exp23
    main
    

    Success :)

    The import system can be pretty confusing, spending some time with the import system docs and importlib docs will save you a lot of headache when a similar situation arrises.

    As an optional suggestion, you can move module.py in experiment_23 and out of subpackage and delete subpackage. Now you can do from .dependency import function. Some would argue that doing super relative imports (using more than one .) is bad practice or at least shows insufficient understanding of package structuring.

    Edit: All changes applied to experiment_23 are respective to the main experiment's package and every other experiment_n...