Search code examples
pythonboilerplaterelative-import

What is the correct boilerplate for explicit relative imports?


In PEP 366 - Main module explicit relative imports which introduced the module-scope variable __package__ to allow explicit relative imports in submodules, there is the following excerpt:

When the main module is specified by its filename, then the __package__ attribute will be set to None. To allow relative imports when the module is executed directly, boilerplate similar to the following would be needed before the first relative import statement:

if __name__ == "__main__" and __package__ is None:
    __package__ = "expected.package.name"

Note that this boilerplate is sufficient only if the top level package is already accessible via sys.path. Additional code that manipulates sys.path would be needed in order for direct execution to work without the top level package already being importable.

This approach also has the same disadvantage as the use of absolute imports of sibling modules - if the script is moved to a different package or subpackage, the boilerplate will need to be updated manually. It has the advantage that this change need only be made once per file, regardless of the number of relative imports.

I have tried to use this boilerplate in the following setting:

  • Directory layout:

    foo
    ├── bar.py
    └── baz.py
    
  • Contents of the bar.py submodule:

    if __name__ == "__main__" and __package__ is None:
        __package__ = "foo"
    
    from . import baz
    

The boilerplate works when executing the submodule bar.py from the file system (the PYTHONPATH modification makes the package foo/ accessible on sys.path):

PYTHONPATH=$(pwd) python3 foo/bar.py

The boilerplate also works when executing the submodule bar.py from the module namespace:

python3 -m foo.bar

However the following alternative boilerplate works just as well in both cases as the contents of the bar.py submodule:

if __package__:
    from . import baz
else:
    import baz

Furthermore this alternative boilerplate is simpler and does not require any update of the submodule bar.py when it is moved with the submodule baz.py to a different package (since it does not hard code the package name "foo").

So here are my questions about the boilerplate of PEP 366:

  1. Is the first subexpression __name__ == "__main__" necessary or is it already implied by the second subexpression __package__ is None?
  2. Shouldn’t the second subexpression __package__ is None be not __package__ instead, in order to handle the case where __package__ is the empty string (like in a __main__.py submodule executed from the file system by supplying the containing directory: PYTHONPATH=$(pwd) python3 foo/)?

Solution

  • The correct boilerplate is none, just write the explicit relative import and let the exception escape if someone tries to run the module as a script or has sys.path misconfigured:

    from . import baz
    

    The boilerplate given in PEP 366 is just there to show that the proposed change is sufficient to allow users to make direct execution* work if they really want to, it isn’t intended to suggest that making direct execution work is a good idea (it isn’t, it is a bad idea that will almost inevitably cause other problems, even with the boilerplate from the PEP).

    Your proposed alternative boilerplate recreates the problem caused by implicit relative imports in Python 2: the "baz" module gets imported as baz from __main__, but will be imported as "foo.baz" everywhere else, so you end up with two copies in sys.modules under different names.

    Amongst other problems, this means that if some other module throws foo.baz.SomeException and your __main__ module tries to catch baz.SomeException, it won’t work, as those will be two different exception objects coming from two different modules.

    By contrast, if you use the PEP boilerplate, then __main__ will correctly import baz as "foo.baz", and the only thing you have to worry about is other modules potentially importing foo.bar.

    If you want simpler boilerplate that explicitly guards against the "inadvertently making two copies of the same module under a different name" bug without hardcoding the package name, then you can use this:

    if not __package__:
        raise RuntimeError(f"{__file__} must be imported as a package submodule")
    

    However, if you are going to do that, you can just as well do from . import baz unconditionally as suggested above, and let the underlying exception escape if someone tries to run the script directly instead of via the -m switch.


    * Direct execution means executing code from:

    1. A file path argument except directory and zip file paths (python <file path>).
    2. A -c argument (python -c <code>).
    3. The interactive interpreter (python).
    4. Standard input (python < <file path>).

    Indirect execution means executing code from:

    1. A directory or zip file path argument (python <directory or zip file path>).
    2. A -m argument (python -m <module name>).
    3. An import statement (import <module name>)

    Now to answer your questions specifically:

    1. Is the first subexpression __name__ == "__main__" necessary or is it already implied by the second subexpression __package__ is None?

    It is hard to get __package__ is None anywhere other than the __main__ module with the modern import system. But it used to be a lot more common, as rather than being set by the import system on module load, __package__ would instead be set lazily by the first explicit relative import executed in the module. In other words, the boilerplate is only trying to let direct execution work (cases 1 to 4 above) but __package__ is None used to imply direct execution or an import statement (case 7 above), so to filter out case 7 the subexpression __name__ == "__main__" (cases 1 to 6 above) was necessary.

    1. Shouldn’t the second subexpression __package__ is None be not __package__ instead, in order to handle the case where __package__ is the empty string (like in a __main__.py submodule executed from the file system by supplying the containing directory: PYTHONPATH=$(pwd) python3 foo/)?

    No because the boilerplate is only trying to let direct execution work (cases 1 to 4 above), it isn’t trying to let other flavours of sys.path misconfiguration pass silently.