Search code examples
pythonpython-importpython-packagingpython-importlib

How do I dynamically define a package level __all__ without running any slow code in the package's constituent modules?


TL;DR: How do I read a module's __all__ definition and dynamically add it to the package-level __init__.py without actually running any slow code in the module itself?

I am writing a library and have a package structure not unlike this:

library/

    package1/
        __init__.py        # sub-package __init__
        _module_a.py
        _module_b.py

    package2/
        __init.__py        # package level __init__
        subpackage/
            __init__.py    # sub-package __init__
            _module_d.py
            _module_e.py
            _module_f.py
        _module_g.py

    __init__.py            # Library level __init__

I use '_' prefixes on all my modules because I want to tightly control what the user can see whenever they call something like dir(library.package1). To that end, I make sure each module where this is the case has an __all__ list defined.

For example,

"""Inside of _module_e.py"""
import time
__all__ = ["Foo", "Bar"]


# do computationally intensive stuff
time.sleep(5)


class Foo:    
    pass

class Bar:
    pass

and

"""Inside of _module_f.py"""
import time

__all__ = ["Baz"]

# do more stuff that takes a long time
time.sleep(5)


class Baz:
    pass
    

To make sure that time isn't wasted running all the computationally expensive code, a user wanting to use the Baz class might normally write

from library.package2.subpackage._module_f import Baz

but I think this is way clunkier than writing something nice like from library.package2.subpackage import Baz. Clearly then, I have to do something in the sub-package's _init_.py file to enable this desired import behaviour.

Without restructuring my files, is it possible to dynamically import modules as and when they are needed? Should I restructure/refactor my files in some way? Is there some other approach I'm missing?

I know I can define a __getattr__(name) in the _init_.py file and use importlib to dynamically import from a module, but that still requires me to hand-copy the contents of each module's __all__ list into the __all__ list of the _init_.py file, like below

"""Inside of subpackage/__init__.py"""
import importlib

# I have to create the below dictionary and maintain it manually!!!
defined_classes = {
    "Foo": "_module_e",
    "Bar": "_module_e",
    "Baz": "_module_f"
}

__all__ = [] + list(defined_classes.keys())


def __dir__():
    return __all__


def __getattr__(name):
    if name in defined_classes:
        file = defined_classes[name]
        return getattr(_importlib.import_module(f'library.package2.subpackage.{file}'), name)
    else:
        try:
            return globals()[name]
        except KeyError:
            raise AttributeError(f"Module 'subpackage' has no attribute '{name}'")

I'm sure I could write a quick and dirty method to with open(filename) as f and parse lines of each module file until I find something that looks like an __all__ list to procedurally generate my defined_classes mapping, but I don't know what the best way of doing this is (or if there is a better solution native to Python already).


Solution

  • I came to the conclusion that it was probably best I refactored my modules instead so that the slow code only ran inside of functions that were called (and cached) as needed - and the burden of deciding when to run this code shouldn't fall on init.py.

    Another benefit of not messing with init is that my IDE better understands what's going on and provides me with the relevant tooltips.