I have a machine learning project and I have to conduct m.l. experiments. Each experiment is made with a module that is strongly dependent on other files in the repository. For the sake of reproducibility, I consider each experiment as a snapshot of the repository. So later I want to reproduce the experiment and compare it to another experiment in ONE script. To do this I restore the state of the repository at the moment of the experiment in special folder reproduce/experiment_name
and organize the following structure:
├── launcher.py
├── package
│ ├── dependency.py
│ └── subpackage
│ └── module.py
└── reproduce
├── experiment_23
│ └── package
│ ├── dependency.py
│ └── subpackage
│ ├── module.py
└── experiment_24
└── package
├── dependency.py
└── subpackage
└── module.py
So in launcher.py
I want to import the content of experiment_23
given the path to the needed module with importlib.utils
. But I need to keep it's dependencies so that old code works in its own realm. I come up with an idea to modify the sys.path
and add path right before the import and then remove it. Here is the content of launcher.py
.
import sys
import importlib.util
path_to_module = "/package/subpackage/module.py"
experiment_path = "/reproduce/exp23"
project_path = "/home/user/PycharmProjects/PathTest"
# Here I add path to experiment directory
sys.path.insert(0, project_path + experiment_path)
spec = importlib.util.spec_from_file_location("", project_path + experiment_path + path_to_module, )
exp23 = importlib.util.module_from_spec(spec)
spec.loader.exec_module(exp23)
# Clear path
sys.path = sys.path[1:]
# This runs successfully
exp23.run()
# So now i want to import my newest version of the module.py
# And this fails, meaning it executes code from exp23
import package.subpackage.module as exp
exp.run()
Content of module.py
from package.dependency import function
def run():
function()
and dependency.py
is a success marker:
def function():
print("23")
So the problem is that once the import was made and dependency loaded from package.dependency import function
, python will no longer look for it in sys.path
and use some kind of cache. My question is how can I avoid usage of cache in this case or at least how can I achieve similar functionality with different methods?
So before anything else, I'll assume you have an __init__.py
inside every package
and subpackage
directory to make them actual "packages"; as specified here.
Now, you should notice how subpackage
containing module.py
is an actual sub-package of package
. So when you do from package.dependency import ...
, you are trying to import from the parent package. Not only is this a bad practice, but the path that you use in launcher.py
only-and-directly targets module.py
, which has no idea what package
is, or even subpackage
for that matter.
What you should be targeting is package
, which (again, assuming you have an __init__.py
in there) will load as an actual "package". Making everything inside it properly accessible... after a few adjustments.
In your __init__.py
's you should expose everything you want this module to contain. For example, in your package/__init__.py
you should have the line from . import subpackage
. This will not only allow you to do stuff like package.subpackage
, and it's also vital in loading the contents of subpackage
.
Following the same logic, in your subpackage/__init__.py
there should be from . import module
, loading and exposing module.py
, allowing you to do package.subpackage.module
.
This should be all you'd need to properly load and access all your desired contents... except you also wrap all of that in experiment_n
folders. Here you have two options: You can make experiment_n
a package with an __init__.py
, expose its package
and target that with importlib
, or you could rename package
to experiment_n
and delete the top folder. Going from this:
experiment_23
└── package
├── __init__.py
├── dependency.py
└── subpackage
├── __init__.py
└── module.py
To this:
experiment_23
├── __init__.py
├── dependency.py
└── subpackage
├── __init__.py
└── module.py
Let's take a break from folder editing and check out how you'd use our current iteration in code. Consider the following launcher.py
:
import sys, os
import importlib.util
current_directory = os.path.dirname(os.path.realpath(__file__))
path_to_experiment = os.path.join(current_directory, 'reproduce/experiment_23')
path_to_experiment = os.path.join(path_to_experiment, '__init__.py') # package!
spec = importlib.util.spec_from_file_location('exp23', path_to_experiment, submodule_search_locations = [])
exp23 = importlib.util.module_from_spec(spec)
sys.modules[exp23.__name__] = exp23
# old one
spec.loader.exec_module(exp23)
exp23.subpackage.module.run()
# new one
import package.subpackage.module as exp
exp.run()
With the main function printing main
and the reproduce one printing exp23
.
First, we compose our path_to_experiment
, pointing to our package's __init__.py
. module_from_file_location
needs a file, not a folder, hence why we don't target /expriment_23
directly. After, with an actual name (exp23
), we get our spec
with spec_from_file_location
, setting submodule_search_locations
to an empty list to indicate it's a package as specified here. Last, fetch our module with module_from_spec
.
It's important that we add our new module to sys.modules
so relative imports (which we use) can find their parent module.
We let our module execute and call its and our new module's run
function:
main
main
Seems like it didn't work, we wanted the first one to be exp23
The culprit is this from package.dependency import function
line in our experiment_23/subpackage/module.py
. The import here is absolute
, not relative
, meaning it will look for / or import "package"
in / to sys.modules
and use that. In this case, this points directly to our main package. Changing it to from ..dependency import function
and re-running launcher.py
gives us:
exp23
main
Success :)
The import system can be pretty confusing, spending some time with the import system docs and importlib docs will save you a lot of headache when a similar situation arrises.
As an optional suggestion, you can move module.py
in experiment_23
and out of subpackage
and delete subpackage
. Now you can do from .dependency import function
. Some would argue that doing super relative imports (using more than one .
) is bad practice or at least shows insufficient understanding of package structuring.
Edit: All changes applied to experiment_23
are respective to the main experiment's package and every other experiment_n
...