Search code examples
pythonpython-importpython-module

How to avoid the "Duplicate Module Import Trap" in python?


I recently encountered a perplexing issue regarding module imports in Python, particularly with the way instances of classes can be duplicated based on different import styles.

I had to fight to be able to reproduce. It was not so easy.

Main issue

According to the import style, a module variable can be duplicated (instantiated several time, even without cyclic import). It's very hard to see that at program time, and can be tedious to debug.

Main question

What is the best practice to avoid this issue?

Illustration

Here is a simple project

PROJECT ROOT FOLDER
└───app
    │   main.py
    │
    └───websocket
            a.py
            b.py
            ws.py
            __init__.py

main.py

import sys


def log_modules():
    print("\n\nModules currently in cache:")
    for module_name in sys.modules:
        if ("web" in module_name or "ws" in module_name) and ("windows" not in module_name and "asyncio" not in module_name):
            print(f" - {module_name}: {id(sys.modules[module_name])}")


from app.websocket.a import a
log_modules()
from app.websocket.b import b
log_modules()


if __name__ == "__main__":
    a()
    b()

ws.py


class ConnectionManager:
    def __init__(self):
        print(f"New ConnectionManager object created, id: {id(self)}")
        self.caller_list = []

    def use(self, caller):
        self.caller_list.append(caller)
        print(f"ConnectionManager object used by {caller}, id: {id(self)}. Callers = {self.caller_list}")


websocket_manager = ConnectionManager()

a.py

from websocket.ws import websocket_manager  # <= one import style: legitimate


def a():
    websocket_manager.use("a")

b.py

from .ws import websocket_manager  # <= another import style: legitimate also


def b():
    websocket_manager.use("b")

It outputs:

New ConnectionManager object created, id: 1553357629648
New ConnectionManager object created, id: 1553357630608
ConnectionManager object used by a, id: 1553357629648. Callers = ['a']
ConnectionManager object used by b, id: 1553357630608. Callers = ['b']

When we would expect only one ConnectionManager instance.

I believe both imports are legitimate, especially in a development team where different styles may occur (even if we don't want: this issue occurs).

The question is: what should be the best practice to apply blindly ?

Additional question the logs of modules show:

Modules currently in cache:
 - app.websocket: 1553355041312
 - websocket: 1553357652672
 - websocket.ws: 1553357652512    <= here we are after main.py from app.websocket.a import a
 - app.websocket.a: 1553355040112
 - app.websocket.ws: 1553357653632
 - app.websocket.b: 1553357652112  <= here we are after main.py from app.websocket.b import b

We can see the issue: ws module is imported twice: once as websocket.ws, another as app.websocket.ws.
Can someone explain that in an easy way? I can't ;-)
And I feel we are on a complex aspect of python, that usually we don't want to bother with! Python is so simple in a lot of aspects...


Solution

  • The correct way:

    • add __init__.py to app/, to make app a package (instead of it being an implicit namespace package).
    • From PROJECT ROOT FOLDER, run app/main.py as a module with python -m app.main. (Don't run python app/main.py or similar.)

    This way it will be impossible to import app.websocket as websocket, because it simply won't be available on sys.path like that. (Only PROJECT ROOT FOLDER will be on sys.path, and websocket is not a direct descendant of it, so it won't be importable as import websocket.)